[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4018) * 5a76928eaf6355d22fb655a821cb5f922f560fe2 UNKNOWN * ed389accd5133d94808984cbbe573cc89517beb1 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4026) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] carp84 closed pull request #344: [blog] flink on zeppelin - part2
carp84 closed pull request #344: URL: https://github.com/apache/flink-web/pull/344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4018) * 5a76928eaf6355d22fb655a821cb5f922f560fe2 UNKNOWN * ed389accd5133d94808984cbbe573cc89517beb1 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4018) * 5a76928eaf6355d22fb655a821cb5f922f560fe2 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XBaith commented on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
XBaith commented on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-649224082 Thank you @RocMarshal. LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XBaith commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
XBaith commented on a change in pull request #12727: URL: https://github.com/apache/flink/pull/12727#discussion_r445307991 ## File path: docs/learn-flink/fault_tolerance.zh.md ## @@ -29,180 +29,137 @@ under the License. ## State Backends -The keyed state managed by Flink is a sort of sharded, key/value store, and the working copy of each -item of keyed state is kept somewhere local to the taskmanager responsible for that key. Operator -state is also local to the machine(s) that need(s) it. Flink periodically takes persistent snapshots -of all the state and copies these snapshots somewhere more durable, such as a distributed file -system. +由 Flink 管理的 keyed state 是一种分片的键/值存储,每个 keyed state 的工作副本都保存在负责该键的 taskmanager 本地中。另外,Operator state 也保存在机器节点本地。Flink 定期获取所有状态的快照,并将这些快照复制到持久化的位置,例如分布式文件系统。 -In the event of the failure, Flink can restore the complete state of your application and resume -processing as though nothing had gone wrong. +如果发生故障,Flink 可以恢复应用程序的完整状态并继续处理,就如同没有出现过异常。 -This state that Flink manages is stored in a _state backend_. Two implementations of state backends -are available -- one based on RocksDB, an embedded key/value store that keeps its working state on -disk, and another heap-based state backend that keeps its working state in memory, on the Java heap. -This heap-based state backend comes in two flavors: the FsStateBackend that persists its state -snapshots to a distributed file system, and the MemoryStateBackend that uses the JobManager's heap. +Flink 管理的状态存储在 _state backend_ 中。Flink 有两种 state backend 的实现 -- 一种基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的,另一种基于堆的 state backend,将其工作状态保存在 Java 的堆内存中。这种基于堆的 state backend 有两种类型:FsStateBackend,将其状态快照持久化到分布式文件系统;MemoryStateBackend,它使用 JobManager 的堆保存状态快照。 - Name + 名称 Working State - State Backup - Snapshotting + 状态备份 + 快照 RocksDBStateBackend - Local disk (tmp dir) - Distributed file system - Full / Incremental + 本地磁盘(tmp dir) + 分布式文件系统 + 全量 / 增量 - Supports state larger than available memory - Rule of thumb: 10x slower than heap-based backends + 支持大于内存大小的状态 + 经验法则:比基于堆的后端慢10倍 FsStateBackend JVM Heap - Distributed file system - Full + 分布式文件系统 + 全量 - Fast, requires large heap - Subject to GC + 快速,需要大的堆内存 + 受限制于 GC MemoryStateBackend JVM Heap JobManager JVM Heap - Full + 全量 - Good for testing and experimentation with small state (locally) + 适用于小状态(本地)的测试和实验 -When working with state kept in a heap-based state backend, accesses and updates involve reading and -writing objects on the heap. But for objects kept in the `RocksDBStateBackend`, accesses and updates -involve serialization and deserialization, and so are much more expensive. But the amount of state -you can have with RocksDB is limited only by the size of the local disk. Note also that only the -`RocksDBStateBackend` is able to do incremental snapshotting, which is a significant benefit for -applications with large amounts of slowly changing state. +当使用基于堆的 state backend 保存状态时,访问和更新涉及在堆上读写对象。但是对于保存在 `RocksDBStateBackend` 中的对象,访问和更新涉及序列化和反序列化,所以会有更大的开销。但 RocksDB 的状态量仅受本地磁盘大小的限制。还要注意,只有 `RocksDBStateBackend` 能够进行增量快照,这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。 -All of these state backends are able to do asynchronous snapshotting, meaning that they can take a -snapshot without impeding the ongoing stream processing. +所有这些 state backends 都能够异步执行快照,这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。 {% top %} -## State Snapshots +## 状态快照 -### Definitions +### 定义 -* _Snapshot_ -- a generic term referring to a global, consistent image of the state of a Flink job. - A snapshot includes a pointer into each of the data sources (e.g., an offset into a file or Kafka - partition), as well as a copy of the state from each of the job's stateful operators that resulted - from having processed all of the events up to those positions in the sources. -* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of being able to recover - from faults. Checkpoints can be incremental, and are optimized for being restored quickly. -* _Externalized Checkpoint_ -- normally checkpoints are not intended to be manipulated by users. - Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) while a job is - running, and deletes them when a job is cancelled. But you can configure them to be retained - instead, in which case you can manually resume from them. -* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for some operational - purpose, such as a stateful redeploy/upgrade/rescaling ope
[GitHub] [flink] RocMarshal removed a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
RocMarshal removed a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-649217684 @klion26 , @XBaith Thank you for your help. I have made some alterations according to your suggestions, please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
RocMarshal commented on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-649218289 @klion26 , @XBaith Thank you for your help. I have made some alterations according to your suggestions, please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal closed pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
RocMarshal closed pull request #12727: URL: https://github.com/apache/flink/pull/12727 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
RocMarshal commented on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-649217684 @klion26 , @XBaith Thank you for your help. I have made some alterations according to your suggestions, please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL
flinkbot edited a comment on pull request #12768: URL: https://github.com/apache/flink/pull/12768#issuecomment-649142405 ## CI report: * 066e41b9e06caf58e4ef5a8fbe6ea49b0bc9168e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4025) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144611#comment-17144611 ] Congxian Qiu(klion26) commented on FLINK-18433: --- [~Aihua] thanks for the work, maybe the affect version should be 1.11.0? > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.1 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] klion26 commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
klion26 commented on a change in pull request #12727: URL: https://github.com/apache/flink/pull/12727#discussion_r445279543 ## File path: docs/learn-flink/fault_tolerance.zh.md ## @@ -29,180 +29,137 @@ under the License. ## State Backends -The keyed state managed by Flink is a sort of sharded, key/value store, and the working copy of each -item of keyed state is kept somewhere local to the taskmanager responsible for that key. Operator -state is also local to the machine(s) that need(s) it. Flink periodically takes persistent snapshots -of all the state and copies these snapshots somewhere more durable, such as a distributed file -system. +由 Flink 管理的 keyed state 是一种分片的键/值存储,每个 keyed state 的工作副本都保存在负责该键的 taskmanager 本地的某个位置。Operator state 也存在机器节点本地。Flink 定期获取所有状态的快照,并将这些快照复制到持久化的位置,例如分布式文件系统。 -In the event of the failure, Flink can restore the complete state of your application and resume -processing as though nothing had gone wrong. +如果发生故障,Flink 可以恢复应用程序的完整状态并继续处理,就如同没有出现过异常。 -This state that Flink manages is stored in a _state backend_. Two implementations of state backends -are available -- one based on RocksDB, an embedded key/value store that keeps its working state on -disk, and another heap-based state backend that keeps its working state in memory, on the Java heap. -This heap-based state backend comes in two flavors: the FsStateBackend that persists its state -snapshots to a distributed file system, and the MemoryStateBackend that uses the JobManager's heap. +Flink 管理的状态存储在 _state backend_ 中。Flink 有两种 state backend 的实现 -- 一种基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的,另一种基于堆的 state backend,将其工作状态保存在 Java 的堆内存中。这种基于堆的 state backend 有两种类型:FsStateBackend,将其状态快照持久化到分布式文件系统;MemoryStateBackend,它使用 JobManager 的堆保存状态快照。 - Name + 名称 Working State - State Backup - Snapshotting + 状态备份 + 快照 RocksDBStateBackend - Local disk (tmp dir) - Distributed file system - Full / Incremental + 本地磁盘(tmp dir) + 分布式文件系统 + 全量 / 增量 - Supports state larger than available memory - Rule of thumb: 10x slower than heap-based backends + 支持大于内存大小的状态 + 经验法则:比基于堆的后端慢10倍 FsStateBackend JVM Heap - Distributed file system - Full + 分布式文件系统 + 全量 - Fast, requires large heap - Subject to GC + 快速,需要大的堆内存 + 受限制于 GC MemoryStateBackend JVM Heap JobManager JVM Heap - Full + 全量 - Good for testing and experimentation with small state (locally) + 适用于小状态(本地)的测试和实验 -When working with state kept in a heap-based state backend, accesses and updates involve reading and -writing objects on the heap. But for objects kept in the `RocksDBStateBackend`, accesses and updates -involve serialization and deserialization, and so are much more expensive. But the amount of state -you can have with RocksDB is limited only by the size of the local disk. Note also that only the -`RocksDBStateBackend` is able to do incremental snapshotting, which is a significant benefit for -applications with large amounts of slowly changing state. +当使用基于堆的 state backend 保存状态时,访问和更新涉及在堆上读写对象。但是对于保存在 `RocksDBStateBackend` 中的对象,访问和更新涉及序列化和反序列化,所以会有更大的开销。但 RocksDB 的状态量仅受本地磁盘大小的限制。还要注意,只有 `RocksDBStateBackend` 能够进行增量快照,这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。 -All of these state backends are able to do asynchronous snapshotting, meaning that they can take a -snapshot without impeding the ongoing stream processing. +所有这些 state backends 都能够异步执行快照,这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。 {% top %} -## State Snapshots +## 状态快照 -### Definitions +### 定义 -* _Snapshot_ -- a generic term referring to a global, consistent image of the state of a Flink job. - A snapshot includes a pointer into each of the data sources (e.g., an offset into a file or Kafka - partition), as well as a copy of the state from each of the job's stateful operators that resulted - from having processed all of the events up to those positions in the sources. -* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of being able to recover - from faults. Checkpoints can be incremental, and are optimized for being restored quickly. -* _Externalized Checkpoint_ -- normally checkpoints are not intended to be manipulated by users. - Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) while a job is - running, and deletes them when a job is cancelled. But you can configure them to be retained - instead, in which case you can manually resume from them. -* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for some operational - purpose, such as a stateful redeploy/upgrade/rescaling op
[GitHub] [flink] klion26 commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
klion26 commented on a change in pull request #12727: URL: https://github.com/apache/flink/pull/12727#discussion_r445280354 ## File path: docs/learn-flink/fault_tolerance.zh.md ## @@ -29,180 +29,137 @@ under the License. ## State Backends -The keyed state managed by Flink is a sort of sharded, key/value store, and the working copy of each -item of keyed state is kept somewhere local to the taskmanager responsible for that key. Operator -state is also local to the machine(s) that need(s) it. Flink periodically takes persistent snapshots -of all the state and copies these snapshots somewhere more durable, such as a distributed file -system. +由 Flink 管理的 keyed state 是一种分片的键/值存储,每个 keyed state 的工作副本都保存在负责该键的 taskmanager 本地中。另外,Operator state 也保存在机器节点本地。Flink 定期获取所有状态的快照,并将这些快照复制到持久化的位置,例如分布式文件系统。 -In the event of the failure, Flink can restore the complete state of your application and resume -processing as though nothing had gone wrong. +如果发生故障,Flink 可以恢复应用程序的完整状态并继续处理,就如同没有出现过异常。 -This state that Flink manages is stored in a _state backend_. Two implementations of state backends -are available -- one based on RocksDB, an embedded key/value store that keeps its working state on -disk, and another heap-based state backend that keeps its working state in memory, on the Java heap. -This heap-based state backend comes in two flavors: the FsStateBackend that persists its state -snapshots to a distributed file system, and the MemoryStateBackend that uses the JobManager's heap. +Flink 管理的状态存储在 _state backend_ 中。Flink 有两种 state backend 的实现 -- 一种基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的,另一种基于堆的 state backend,将其工作状态保存在 Java 的堆内存中。这种基于堆的 state backend 有两种类型:FsStateBackend,将其状态快照持久化到分布式文件系统;MemoryStateBackend,它使用 JobManager 的堆保存状态快照。 - Name + 名称 Working State - State Backup - Snapshotting + 状态备份 + 快照 RocksDBStateBackend - Local disk (tmp dir) - Distributed file system - Full / Incremental + 本地磁盘(tmp dir) + 分布式文件系统 + 全量 / 增量 - Supports state larger than available memory - Rule of thumb: 10x slower than heap-based backends + 支持大于内存大小的状态 + 经验法则:比基于堆的后端慢10倍 FsStateBackend JVM Heap - Distributed file system - Full + 分布式文件系统 + 全量 - Fast, requires large heap - Subject to GC + 快速,需要大的堆内存 + 受限制于 GC MemoryStateBackend JVM Heap JobManager JVM Heap - Full + 全量 - Good for testing and experimentation with small state (locally) + 适用于小状态(本地)的测试和实验 -When working with state kept in a heap-based state backend, accesses and updates involve reading and -writing objects on the heap. But for objects kept in the `RocksDBStateBackend`, accesses and updates -involve serialization and deserialization, and so are much more expensive. But the amount of state -you can have with RocksDB is limited only by the size of the local disk. Note also that only the -`RocksDBStateBackend` is able to do incremental snapshotting, which is a significant benefit for -applications with large amounts of slowly changing state. +当使用基于堆的 state backend 保存状态时,访问和更新涉及在堆上读写对象。但是对于保存在 `RocksDBStateBackend` 中的对象,访问和更新涉及序列化和反序列化,所以会有更大的开销。但 RocksDB 的状态量仅受本地磁盘大小的限制。还要注意,只有 `RocksDBStateBackend` 能够进行增量快照,这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。 -All of these state backends are able to do asynchronous snapshotting, meaning that they can take a -snapshot without impeding the ongoing stream processing. +所有这些 state backends 都能够异步执行快照,这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。 {% top %} -## State Snapshots +## 状态快照 -### Definitions +### 定义 -* _Snapshot_ -- a generic term referring to a global, consistent image of the state of a Flink job. - A snapshot includes a pointer into each of the data sources (e.g., an offset into a file or Kafka - partition), as well as a copy of the state from each of the job's stateful operators that resulted - from having processed all of the events up to those positions in the sources. -* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of being able to recover - from faults. Checkpoints can be incremental, and are optimized for being restored quickly. -* _Externalized Checkpoint_ -- normally checkpoints are not intended to be manipulated by users. - Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) while a job is - running, and deletes them when a job is cancelled. But you can configure them to be retained - instead, in which case you can manually resume from them. -* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for some operational - purpose, such as a stateful redeploy/upgrade/rescaling op
[GitHub] [flink] klion26 commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
klion26 commented on a change in pull request #12727: URL: https://github.com/apache/flink/pull/12727#discussion_r445279543 ## File path: docs/learn-flink/fault_tolerance.zh.md ## @@ -29,180 +29,137 @@ under the License. ## State Backends -The keyed state managed by Flink is a sort of sharded, key/value store, and the working copy of each -item of keyed state is kept somewhere local to the taskmanager responsible for that key. Operator -state is also local to the machine(s) that need(s) it. Flink periodically takes persistent snapshots -of all the state and copies these snapshots somewhere more durable, such as a distributed file -system. +由 Flink 管理的 keyed state 是一种分片的键/值存储,每个 keyed state 的工作副本都保存在负责该键的 taskmanager 本地的某个位置。Operator state 也存在机器节点本地。Flink 定期获取所有状态的快照,并将这些快照复制到持久化的位置,例如分布式文件系统。 -In the event of the failure, Flink can restore the complete state of your application and resume -processing as though nothing had gone wrong. +如果发生故障,Flink 可以恢复应用程序的完整状态并继续处理,就如同没有出现过异常。 -This state that Flink manages is stored in a _state backend_. Two implementations of state backends -are available -- one based on RocksDB, an embedded key/value store that keeps its working state on -disk, and another heap-based state backend that keeps its working state in memory, on the Java heap. -This heap-based state backend comes in two flavors: the FsStateBackend that persists its state -snapshots to a distributed file system, and the MemoryStateBackend that uses the JobManager's heap. +Flink 管理的状态存储在 _state backend_ 中。Flink 有两种 state backend 的实现 -- 一种基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的,另一种基于堆的 state backend,将其工作状态保存在 Java 的堆内存中。这种基于堆的 state backend 有两种类型:FsStateBackend,将其状态快照持久化到分布式文件系统;MemoryStateBackend,它使用 JobManager 的堆保存状态快照。 - Name + 名称 Working State - State Backup - Snapshotting + 状态备份 + 快照 RocksDBStateBackend - Local disk (tmp dir) - Distributed file system - Full / Incremental + 本地磁盘(tmp dir) + 分布式文件系统 + 全量 / 增量 - Supports state larger than available memory - Rule of thumb: 10x slower than heap-based backends + 支持大于内存大小的状态 + 经验法则:比基于堆的后端慢10倍 FsStateBackend JVM Heap - Distributed file system - Full + 分布式文件系统 + 全量 - Fast, requires large heap - Subject to GC + 快速,需要大的堆内存 + 受限制于 GC MemoryStateBackend JVM Heap JobManager JVM Heap - Full + 全量 - Good for testing and experimentation with small state (locally) + 适用于小状态(本地)的测试和实验 -When working with state kept in a heap-based state backend, accesses and updates involve reading and -writing objects on the heap. But for objects kept in the `RocksDBStateBackend`, accesses and updates -involve serialization and deserialization, and so are much more expensive. But the amount of state -you can have with RocksDB is limited only by the size of the local disk. Note also that only the -`RocksDBStateBackend` is able to do incremental snapshotting, which is a significant benefit for -applications with large amounts of slowly changing state. +当使用基于堆的 state backend 保存状态时,访问和更新涉及在堆上读写对象。但是对于保存在 `RocksDBStateBackend` 中的对象,访问和更新涉及序列化和反序列化,所以会有更大的开销。但 RocksDB 的状态量仅受本地磁盘大小的限制。还要注意,只有 `RocksDBStateBackend` 能够进行增量快照,这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。 -All of these state backends are able to do asynchronous snapshotting, meaning that they can take a -snapshot without impeding the ongoing stream processing. +所有这些 state backends 都能够异步执行快照,这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。 {% top %} -## State Snapshots +## 状态快照 -### Definitions +### 定义 -* _Snapshot_ -- a generic term referring to a global, consistent image of the state of a Flink job. - A snapshot includes a pointer into each of the data sources (e.g., an offset into a file or Kafka - partition), as well as a copy of the state from each of the job's stateful operators that resulted - from having processed all of the events up to those positions in the sources. -* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of being able to recover - from faults. Checkpoints can be incremental, and are optimized for being restored quickly. -* _Externalized Checkpoint_ -- normally checkpoints are not intended to be manipulated by users. - Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) while a job is - running, and deletes them when a job is cancelled. But you can configure them to be retained - instead, in which case you can manually resume from them. -* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for some operational - purpose, such as a stateful redeploy/upgrade/rescaling op
[jira] [Updated] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Li updated FLINK-18433: - Summary: From the end-to-end performance test results, 1.11 has a regression (was: From the end-to-end performance test results, 1.11 has a ) > From the end-to-end performance test results, 1.11 has a regression > --- > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.1 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL
flinkbot edited a comment on pull request #12768: URL: https://github.com/apache/flink/pull/12768#issuecomment-649142405 ## CI report: * 066e41b9e06caf58e4ef5a8fbe6ea49b0bc9168e Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4025) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL
flinkbot commented on pull request #12768: URL: https://github.com/apache/flink/pull/12768#issuecomment-649142405 ## CI report: * 066e41b9e06caf58e4ef5a8fbe6ea49b0bc9168e UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-18427) Job failed under java 11
[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Hao updated FLINK-18427: -- Summary: Job failed under java 11 (was: Job failed on java 11) > Job failed under java 11 > > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Reporter: Zhang Hao >Priority: Critical > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. > > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716) > ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: > Direct buffer memory at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174) > ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at > java.base/java.nio.Bits.reserveMemory(Bits.java:175) at > java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at > java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at > org.apache.
[jira] [Updated] (FLINK-18427) Job failed on java 11
[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Hao updated FLINK-18427: -- Summary: Job failed on java 11 (was: Job failed under java 11) > Job failed on java 11 > - > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Reporter: Zhang Hao >Priority: Critical > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. > > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716) > ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: > Direct buffer memory at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174) > ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at > java.base/java.nio.Bits.reserveMemory(Bits.java:175) at > java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) at > java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at > org.apache.flink.
[jira] [Reopened] (FLINK-18116) Manually test E2E performance on Flink 1.11
[ https://issues.apache.org/jira/browse/FLINK-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Li reopened FLINK-18116: -- I ran end-to-end performance tests between the Release-1.10 and Release-1.11 and found 1.11 has a regression. I submitted a bug:https://issues.apache.org/jira/browse/FLINK-18433 to record the details. > Manually test E2E performance on Flink 1.11 > --- > > Key: FLINK-18116 > URL: https://issues.apache.org/jira/browse/FLINK-18116 > Project: Flink > Issue Type: Sub-task > Components: API / Core, API / DataStream, API / State Processor, > Build System, Client / Job Submission >Affects Versions: 1.11.0 >Reporter: Aihua Li >Assignee: Aihua Li >Priority: Blocker > Labels: release-testing > Fix For: 1.11.0 > > > it's mainly to verify the performance don't less than 1.10 version by > checking the metrics of end-to-end performance test,such as qps,latency . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18433) From the end-to-end performance test results, 1.11 has a
[ https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Li updated FLINK-18433: - Summary: From the end-to-end performance test results, 1.11 has a (was: 1.11 has a regression) > From the end-to-end performance test results, 1.11 has a > - > > Key: FLINK-18433 > URL: https://issues.apache.org/jira/browse/FLINK-18433 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Affects Versions: 1.11.1 > Environment: 3 machines > [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] >Reporter: Aihua Li >Priority: Major > > > I ran end-to-end performance tests between the Release-1.10 and Release-1.11. > the results were as follows: > |scenarioName|release-1.10|release-1.11| | > |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| > |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| > |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| > |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| > |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| > |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| > |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| > |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| > |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| > |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| > |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| > |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| > |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| > |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| > |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| > |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| > |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| > |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| > |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| > |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| > |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| > |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| > |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| > |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| > |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| > |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| > |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| > |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| > |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| > |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| > |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| > |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| > It can be seen that the performance of 1.11 has a regression, basically > around 5%, and the maximum regression is 17%. This needs to be checked. > the test code: > flink-1.10.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > flink-1.11.0: > [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] > commit cmd like tis: > bin/flink run -d -m 192.168.39.246:8081 -c > org.apache.flink.basic.operations.PerformanceTestJob > /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName > OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource > --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18433) 1.11 has a regression
Aihua Li created FLINK-18433: Summary: 1.11 has a regression Key: FLINK-18433 URL: https://issues.apache.org/jira/browse/FLINK-18433 Project: Flink Issue Type: Bug Components: API / Core, API / DataStream Affects Versions: 1.11.1 Environment: 3 machines [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] Reporter: Aihua Li I ran end-to-end performance tests between the Release-1.10 and Release-1.11. the results were as follows: |scenarioName|release-1.10|release-1.11| | |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%| |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%| |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%| |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%| |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%| |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%| |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%| |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%| |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%| |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%| |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%| |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%| |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%| |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%| |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%| |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%| |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%| |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%| |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%| |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%| |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%| |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%| |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%| |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%| |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%| |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%| |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%| |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%| |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%| |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%| |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%| |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%| It can be seen that the performance of 1.11 has a regression, basically around 5%, and the maximum regression is 17%. This needs to be checked. the test code: flink-1.10.0: [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] flink-1.11.0: [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java] commit cmd like tis: bin/flink run -d -m 192.168.39.246:8081 -c org.apache.flink.basic.operations.PerformanceTestJob /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17804) Follow the spec when decoding Parquet logical DECIMAL type
[ https://issues.apache.org/jira/browse/FLINK-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144545#comment-17144545 ] Sergii Mikhtoniuk commented on FLINK-17804: --- I've created [a PR|https://github.com/apache/flink/pull/12768] that should address the problem. [~lzljs3620320], will you be able to have a look? > Follow the spec when decoding Parquet logical DECIMAL type > -- > > Key: FLINK-17804 > URL: https://issues.apache.org/jira/browse/FLINK-17804 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.10.1 >Reporter: Sergii Mikhtoniuk >Priority: Major > Labels: decimal, parquet, pull-request-available, spark > > When reading a Parquet file (produced by Spark 2.4.0 with default > configuration) Flink's {{ParquetRowInputFormat}} fails with > {{NumberFormatException}}. > After debugging this it seems that Flink doesn't follow the Parquet spec on > [encoding DECIMAL logical > type|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal] > The Parquet schema for this field is: > {code} > optional fixed_len_byte_array(9) price_usd (DECIMAL(19,4)); > {code} > If I understand the spec correctly, it says that the value should contain a > binary representation of an unscaled decimal. Flink's > [RowConverter|https://github.com/apache/flink/blob/release-1.10.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/RowConverter.java#L202] > however treats it as a base-10 UTF-8 string. > What Flink essentially is doing: > {code} > val binary = Array[Byte](0, 0, 0, 0, 0, 11, -97, 118, -64) > val decimal = new java.math.BigDecimal(new String(binary, > "UTF-8").toCharArray) > {code} > What I think spec suggests: > {code} > val binary = Array[Byte](0, 0, 0, 0, 0, 11, -97, 118, -64) > val unscaled = new java.math.BigInteger(binary) > val decimal = new java.math.BigDecimal(unscaled) > {code} > Error stacktrace: > {code} > java.lang.NumberFormatException > at java.math.BigDecimal.(BigDecimal.java:497) > at java.math.BigDecimal.(BigDecimal.java:383) > at java.math.BigDecimal.(BigDecimal.java:680) > at > org.apache.flink.formats.parquet.utils.RowConverter$RowPrimitiveConverter.addBinary(RowConverter.java:202) > at > org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317) > at > org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) > at > org.apache.flink.formats.parquet.utils.ParquetRecordReader.readNextRecord(ParquetRecordReader.java:235) > at > org.apache.flink.formats.parquet.utils.ParquetRecordReader.reachEnd(ParquetRecordReader.java:207) > at > org.apache.flink.formats.parquet.ParquetInputFormat.reachedEnd(ParquetInputFormat.java:233) > at > org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator$SplitReader.run(ContinuousFileReaderOperator.java:327) > {code} > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL
flinkbot commented on pull request #12768: URL: https://github.com/apache/flink/pull/12768#issuecomment-649137157 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 066e41b9e06caf58e4ef5a8fbe6ea49b0bc9168e (Thu Jun 25 00:04:49 UTC 2020) **Warnings:** * **1 pom.xml files were touched**: Check for build and licensing issues. * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-17804).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] sergiimk opened a new pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL
sergiimk opened a new pull request #12768: URL: https://github.com/apache/flink/pull/12768 ## What is the purpose of the change Current implementation of the Parquet reader decodes DECIMAL types incorrectly. It doesn't follow the spec and will either throw `NumberFormatException` or read incorrect values even on Parquet files produced by Flink itself. This PR updates the `ParquetSchemaConverter` and `RowConverter` to handle all 4 encoding styles specified by the [spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal): - `int32` - `int64` - `fixed_len_byte_array` - `binary` ## Brief change log - Parquet reader will decode DECIMAL type according to the spec ## Verifying this change This change added tests and can be verified as follows: - Added tests that validate DECIMAL decoding with all backing types specified by Parquet spec - Manually verified the change by reading Parquet files generated by Spark ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: yes - The runtime per-record code paths (performance sensitive): yes - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-17804) Follow the spec when decoding Parquet logical DECIMAL type
[ https://issues.apache.org/jira/browse/FLINK-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-17804: --- Labels: decimal parquet pull-request-available spark (was: decimal parquet spark) > Follow the spec when decoding Parquet logical DECIMAL type > -- > > Key: FLINK-17804 > URL: https://issues.apache.org/jira/browse/FLINK-17804 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.10.1 >Reporter: Sergii Mikhtoniuk >Priority: Major > Labels: decimal, parquet, pull-request-available, spark > > When reading a Parquet file (produced by Spark 2.4.0 with default > configuration) Flink's {{ParquetRowInputFormat}} fails with > {{NumberFormatException}}. > After debugging this it seems that Flink doesn't follow the Parquet spec on > [encoding DECIMAL logical > type|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal] > The Parquet schema for this field is: > {code} > optional fixed_len_byte_array(9) price_usd (DECIMAL(19,4)); > {code} > If I understand the spec correctly, it says that the value should contain a > binary representation of an unscaled decimal. Flink's > [RowConverter|https://github.com/apache/flink/blob/release-1.10.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/RowConverter.java#L202] > however treats it as a base-10 UTF-8 string. > What Flink essentially is doing: > {code} > val binary = Array[Byte](0, 0, 0, 0, 0, 11, -97, 118, -64) > val decimal = new java.math.BigDecimal(new String(binary, > "UTF-8").toCharArray) > {code} > What I think spec suggests: > {code} > val binary = Array[Byte](0, 0, 0, 0, 0, 11, -97, 118, -64) > val unscaled = new java.math.BigInteger(binary) > val decimal = new java.math.BigDecimal(unscaled) > {code} > Error stacktrace: > {code} > java.lang.NumberFormatException > at java.math.BigDecimal.(BigDecimal.java:497) > at java.math.BigDecimal.(BigDecimal.java:383) > at java.math.BigDecimal.(BigDecimal.java:680) > at > org.apache.flink.formats.parquet.utils.RowConverter$RowPrimitiveConverter.addBinary(RowConverter.java:202) > at > org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317) > at > org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) > at > org.apache.flink.formats.parquet.utils.ParquetRecordReader.readNextRecord(ParquetRecordReader.java:235) > at > org.apache.flink.formats.parquet.utils.ParquetRecordReader.reachEnd(ParquetRecordReader.java:207) > at > org.apache.flink.formats.parquet.ParquetInputFormat.reachedEnd(ParquetInputFormat.java:233) > at > org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator$SplitReader.run(ContinuousFileReaderOperator.java:327) > {code} > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12722: [FLINK-18064][docs] Added unaligned checkpointing to docs.
flinkbot edited a comment on pull request #12722: URL: https://github.com/apache/flink/pull/12722#issuecomment-646626465 ## CI report: * 91dfe90937ec1c2680022a12a1c4984fd59e61cf UNKNOWN * f818134b647f6e34bade6189640bd81575f0162c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4021) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Issue Comment Deleted] (FLINK-18116) Manually test E2E performance on Flink 1.11
[ https://issues.apache.org/jira/browse/FLINK-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Li updated FLINK-18116: - Comment: was deleted (was: I mainly ran the stability test developed by Ali: by simulating online abnormal conditions (such as network interruption, full disk, JM/AM process being killed, TM throwing exception, etc.) to check whether flink operation can be automatically recovered. The test lasted 5 hours, simulated multiple abnormal combination scenarios, flink job can return to normal, and the checkpoint can be created. The test pass) > Manually test E2E performance on Flink 1.11 > --- > > Key: FLINK-18116 > URL: https://issues.apache.org/jira/browse/FLINK-18116 > Project: Flink > Issue Type: Sub-task > Components: API / Core, API / DataStream, API / State Processor, > Build System, Client / Job Submission >Affects Versions: 1.11.0 >Reporter: Aihua Li >Assignee: Aihua Li >Priority: Blocker > Labels: release-testing > Fix For: 1.11.0 > > > it's mainly to verify the performance don't less than 1.10 version by > checking the metrics of end-to-end performance test,such as qps,latency . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18115) Manually test fault-tolerance stability on Flink 1.11
[ https://issues.apache.org/jira/browse/FLINK-18115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Li closed FLINK-18115. Resolution: Done I mainly ran the stability test developed by Ali: by simulating online abnormal conditions (such as network interruption, full disk, JM/AM process being killed, TM throwing exception, etc.) to check whether flink operation can be automatically recovered. The test lasted 5 hours, simulated multiple abnormal combination scenarios, flink job can return to normal, and the checkpoint can be created. The test pass > Manually test fault-tolerance stability on Flink 1.11 > - > > Key: FLINK-18115 > URL: https://issues.apache.org/jira/browse/FLINK-18115 > Project: Flink > Issue Type: Sub-task > Components: API / Core, API / State Processor, Build System, Client > / Job Submission >Affects Versions: 1.11.0 >Reporter: Aihua Li >Assignee: Aihua Li >Priority: Blocker > Labels: release-testing > Fix For: 1.11.0 > > > It mainly checks the flink job can recover from various unabnormal > situations including disk full, network interruption, zk unable to connect, > rpc message timeout, etc. > If job can't be recoverd it means test failed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4018) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-9278) Allow restore savepoint with some SQL queries added/removed
[ https://issues.apache.org/jira/browse/FLINK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144418#comment-17144418 ] Samrat Bhattacharya commented on FLINK-9278: [~fhueske]: Any update on this issue? > Allow restore savepoint with some SQL queries added/removed > --- > > Key: FLINK-9278 > URL: https://issues.apache.org/jira/browse/FLINK-9278 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.4.2 >Reporter: Adrian Hains >Assignee: vinoyang >Priority: Major > > We are running a Flink job that contains multiple SQL queries. This is > configured by calling sqlQuery(String) one time for each SQL query, on a > single instance of StreamTableEnvironment. The queries are simple > aggregations with a tumble window. > Currently I can configure my environment with queries Q1, Q2, and Q3, create > a savepoint, and restart the job from that savepoint if the same set of SQL > queries are used. > If I remove some queries and add some others, Q2, Q4, and Q3, I am unable to > restart the job from the same savepoint. This behavior is expected, as the > documentation clearly describes that the operator IDs are generated if they > are not explicitly defined, and they cannot be explicitly defined when using > flink SQL. > I would like to be able to specify a scoping operator id prefix when > registering a SQL query to a StreamTableEnvironment. This can then be used to > programmatically generate unique IDs for each of the operators created to > execute the SQL queries. For example, if I specify a prefix of "ID:Q2:" for > my Q2 query, and I restart the job with an identical SQL query for this > prefix, then I would be able to restore the state for this query even in the > presence of other queries being added or removed to the job graph. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 99d95bf94b1604274c75ad4193aec6ece9416d6d Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4017) * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4018) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-18432) add open and close methods for ElasticsearchSinkFunction interface
rinkako created FLINK-18432: --- Summary: add open and close methods for ElasticsearchSinkFunction interface Key: FLINK-18432 URL: https://issues.apache.org/jira/browse/FLINK-18432 Project: Flink Issue Type: Improvement Components: Connectors / ElasticSearch Reporter: rinkako Here comes a example story: we want to sink data to ES with a day-rolling index name, by using `DateTimeFormatter` and `ZonedDateTime` for generating a day-rolling postfix to add to the index pattern of `return Requests.indexRequest().index("mydoc_"+postfix)` in a custom `ElasticsearchSinkFunction`. Here `DateTimeFormatter` is not a serializable class and must be a transient field of this `ElasticsearchSinkFunction`, and it must be checked null every time we call `process` (since the field is transient, it may be null at a distributed task manager), which can be done at a `open` method only run once. So it seems that add `open` and `close` method of `ElasticsearchSinkFunction` interface can handle this well, users can control their sink function life-cycle more flexiblely. And, for compatibility, this two methods may have a empty default implementation at `ElasticsearchSinkFunction` interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] zentol closed pull request #7820: [FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"
zentol closed pull request #7820: URL: https://github.com/apache/flink/pull/7820 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on pull request #7820: [FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"
zentol commented on pull request #7820: URL: https://github.com/apache/flink/pull/7820#issuecomment-649039991 I will close this PR for now. Without all components having a well-defined and exposed ID we cannot provide a proper container ID (see FLINK-9543). Any interim solution would just require us to change the behavior later on again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12722: [FLINK-18064][docs] Added unaligned checkpointing to docs.
flinkbot edited a comment on pull request #12722: URL: https://github.com/apache/flink/pull/12722#issuecomment-646626465 ## CI report: * 91dfe90937ec1c2680022a12a1c4984fd59e61cf UNKNOWN * 8718566c0b74fd6c3c6327980c1ee1d2abc8eda8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3919) * f818134b647f6e34bade6189640bd81575f0162c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4021) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12722: [FLINK-18064][docs] Added unaligned checkpointing to docs.
flinkbot edited a comment on pull request #12722: URL: https://github.com/apache/flink/pull/12722#issuecomment-646626465 ## CI report: * 91dfe90937ec1c2680022a12a1c4984fd59e61cf UNKNOWN * 8718566c0b74fd6c3c6327980c1ee1d2abc8eda8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3919) * f818134b647f6e34bade6189640bd81575f0162c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 89242d0d76ab11a8d36234835ff121e16c92b17e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4002) * 99d95bf94b1604274c75ad4193aec6ece9416d6d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4017) * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4018) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (FLINK-18428) StreamExecutionEnvironment#continuousSource() method should be renamed to source()
[ https://issues.apache.org/jira/browse/FLINK-18428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-18428. -- Resolution: Fixed Fixed in - 1.11.0 via 369d4a6ce46dddfd2a14ad66ac2e189bd4827158 - 1.12.0 (master) via 49b5103299374641662d66b5165441b532206b71 > StreamExecutionEnvironment#continuousSource() method should be renamed to > source() > -- > > Key: FLINK-18428 > URL: https://issues.apache.org/jira/browse/FLINK-18428 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Jiangjie Qin >Assignee: Jiangjie Qin >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The current code in master did not follow the [latest FLIP-27 > discussion|[https://lists.apache.org/thread.html/r54287e9c9880916560bb97c38962db7b4d326056a7b23a9d0416e6a7%40%3Cdev.flink.apache.org%3E]], > where we should have a \{{StreamExecutionEnvironment#source()}} instead of > \{{StreamExecutionEnvironment#continuousSource()}}. The concern was that the > latter would mislead the users to think that the execution is always in > streaming mode while it actually depends on the boundedness of the source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18428) StreamExecutionEnvironment#continuousSource() method should be renamed to source()
[ https://issues.apache.org/jira/browse/FLINK-18428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-18428. > StreamExecutionEnvironment#continuousSource() method should be renamed to > source() > -- > > Key: FLINK-18428 > URL: https://issues.apache.org/jira/browse/FLINK-18428 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Jiangjie Qin >Assignee: Jiangjie Qin >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The current code in master did not follow the [latest FLIP-27 > discussion|[https://lists.apache.org/thread.html/r54287e9c9880916560bb97c38962db7b4d326056a7b23a9d0416e6a7%40%3Cdev.flink.apache.org%3E]], > where we should have a \{{StreamExecutionEnvironment#source()}} instead of > \{{StreamExecutionEnvironment#continuousSource()}}. The concern was that the > latter would mislead the users to think that the execution is always in > streaming mode while it actually depends on the boundedness of the source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
[ https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-18430. -- Resolution: Fixed Fixed in - 1.11.0 via 25cad5fb517bc8a6a6e2a37c78e43c474a99bbd9 - 1.12.0 (master) via 95b9adbeaa7058c4fc804a5277cbaa958485d63b > Upgrade stability to @Public for CheckpointedFunction and CheckpointListener > > > Key: FLINK-18430 > URL: https://issues.apache.org/jira/browse/FLINK-18430 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Critical > Fix For: 1.11.0 > > > The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are > used by many users, but are still (for years now) marked as > {{@PublicEvolving}}. > I think this is not correct. They are very core to the DataStream API and are > used widely and should be treated as {{@Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
[ https://issues.apache.org/jira/browse/FLINK-18430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-18430. > Upgrade stability to @Public for CheckpointedFunction and CheckpointListener > > > Key: FLINK-18430 > URL: https://issues.apache.org/jira/browse/FLINK-18430 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Critical > Fix For: 1.11.0 > > > The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are > used by many users, but are still (for years now) marked as > {{@PublicEvolving}}. > I think this is not correct. They are very core to the DataStream API and are > used widely and should be treated as {{@Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18429) Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
[ https://issues.apache.org/jira/browse/FLINK-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-18429. > Add default method for > CheckpointListener.notifyCheckpointAborted(checkpointId) > --- > > Key: FLINK-18429 > URL: https://issues.apache.org/jira/browse/FLINK-18429 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The {{CheckpointListener}} interface is implemented by many users. Adding a > new method {{notifyCheckpointAborted(long)}} to the interface without a > default method breaks many user programs. > We should turn this method into a default method: > - Avoid breaking programs > - It is in practice less relevant for programs to react to checkpoints > being aborted then to being completed. The reason is that on completion you > often want to commit side-effects, while on abortion you frequently do not do > anything, but let the next successful checkpoint commit all changes up to > then. > *Original Confusion* > There was confusion about this originally, going back to a comment by myself > suggesting this should not be a default method, incorrectly thinking of it as > an internal interface: > https://github.com/apache/flink/pull/8693#issuecomment-542834147 > See also clarification email on the mailing list: > {noformat} > About the "notifyCheckpointAborted()": > When I wrote that comment, I was (apparently wrongly) assuming we were > talking about an > internal interface here, because the "abort" signal was originally only > intended to cancel the > async part of state backend checkpoints. > I just realized that this is exposed to users - and I am actually with Thomas > on this one. The > "CheckpointListener" is a very public interface that many users implement. > The fact that it is > tagged "@PublicEvolving" is somehow not aligned with reality. So adding the > method here will > in reality break lots and lots of user programs. > I think also in practice it is much less relevant for user applications to > react to aborted checkpoints. > Since the notifications there can not be relied upon (if there is a task > failure concurrently) users > always have to follow the "newer checkpoint subsumes older checkpoint" > contract, so the abort > method is probably rarely relevant. > This is something we should change, in my opinion. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-18429) Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
[ https://issues.apache.org/jira/browse/FLINK-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-18429. -- Resolution: Fixed Fixed in - 1.11.0 via a60a4a9c71a2648176ca46a25230920674130d01 - 1.12.0 (master) via 4776813cc335080dbe8684f51c3aa0f7f1d774d0 > Add default method for > CheckpointListener.notifyCheckpointAborted(checkpointId) > --- > > Key: FLINK-18429 > URL: https://issues.apache.org/jira/browse/FLINK-18429 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The {{CheckpointListener}} interface is implemented by many users. Adding a > new method {{notifyCheckpointAborted(long)}} to the interface without a > default method breaks many user programs. > We should turn this method into a default method: > - Avoid breaking programs > - It is in practice less relevant for programs to react to checkpoints > being aborted then to being completed. The reason is that on completion you > often want to commit side-effects, while on abortion you frequently do not do > anything, but let the next successful checkpoint commit all changes up to > then. > *Original Confusion* > There was confusion about this originally, going back to a comment by myself > suggesting this should not be a default method, incorrectly thinking of it as > an internal interface: > https://github.com/apache/flink/pull/8693#issuecomment-542834147 > See also clarification email on the mailing list: > {noformat} > About the "notifyCheckpointAborted()": > When I wrote that comment, I was (apparently wrongly) assuming we were > talking about an > internal interface here, because the "abort" signal was originally only > intended to cancel the > async part of state backend checkpoints. > I just realized that this is exposed to users - and I am actually with Thomas > on this one. The > "CheckpointListener" is a very public interface that many users implement. > The fact that it is > tagged "@PublicEvolving" is somehow not aligned with reality. So adding the > method here will > in reality break lots and lots of user programs. > I think also in practice it is much less relevant for user applications to > react to aborted checkpoints. > Since the notifications there can not be relied upon (if there is a task > failure concurrently) users > always have to follow the "newer checkpoint subsumes older checkpoint" > contract, so the abort > method is probably rarely relevant. > This is something we should change, in my opinion. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] asfgit closed pull request #12767: [FLINK-18429][DataStream API] Make CheckpointListener.notifyCheckpointAborted(checkpointId) a default method.
asfgit closed pull request #12767: URL: https://github.com/apache/flink/pull/12767 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] asfgit closed pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
asfgit closed pull request #12766: URL: https://github.com/apache/flink/pull/12766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
flinkbot edited a comment on pull request #12699: URL: https://github.com/apache/flink/pull/12699#issuecomment-645525540 ## CI report: * d182fc4331b08fc00e5d7ad339e1f3ae79d6f7bc UNKNOWN * 768b439c76d6e261294a63b5e5d10738419054c1 UNKNOWN * a5ba950dc1eba6a27cfbb521ba42666fbb44c5e2 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4014) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] wanglijie95 commented on pull request #12754: [FLINK-5717][streaming] Fix NPE on SessionWindows with ContinuousProc…
wanglijie95 commented on pull request #12754: URL: https://github.com/apache/flink/pull/12754#issuecomment-648976755 > Thanks for the fix! I think, however, that `WindowOperatorTest` is not the right place to put the test, it should be added in `ContinuousProcessingTimeTriggerTest`. I know that the fix for `ContinuousEventTimeTrigger` also made this mistake but we should now fix it. For this you would first need to add the `Test`, you can do this based on `ContinuousEventTimeTriggerTest` and then add a new test for this fix there. What do you think? Thanks for review! You are right, `ContinuousProcessingTimeTriggerTest` is more suitable. I will change it later. As for the mistake maded by the fix for `ContinuousEventTimeTrigger`, I think it should be fixed in a new PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 89242d0d76ab11a8d36234835ff121e16c92b17e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4002) * 99d95bf94b1604274c75ad4193aec6ece9416d6d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4017) * e57e583260e23ec0c0b4bbbdabd51ec2a08ac726 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12765: [FLINK-18417][table] Support List as a conversion class for ARRAY
flinkbot edited a comment on pull request #12765: URL: https://github.com/apache/flink/pull/12765#issuecomment-648778671 ## CI report: * 29e7c8685d8218fa538e14772fc32d945cf1c56a Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4009) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
RocMarshal commented on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-648963595 Hi,@XBaith I have made some changes based on your suggestions, which is very helpful for improvement. Thank you so much. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
flinkbot edited a comment on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648833005 ## CI report: * 71eefff46a38b7b50afb251e5fb338f64134a821 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4010) * f02e6446e1c33e34a613d06ff2af4098d7ec08f7 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4015) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 89242d0d76ab11a8d36234835ff121e16c92b17e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4002) * 99d95bf94b1604274c75ad4193aec6ece9416d6d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4017) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 89242d0d76ab11a8d36234835ff121e16c92b17e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4002) * 99d95bf94b1604274c75ad4193aec6ece9416d6d UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12767: [FLINK-18429][DataStream API] Make CheckpointListener.notifyCheckpointAborted(checkpointId) a default method.
flinkbot edited a comment on pull request #12767: URL: https://github.com/apache/flink/pull/12767#issuecomment-648936333 ## CI report: * cf0c7ecf53413b9051bb02138515dce579f533f4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4016) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] StephanEwen commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
StephanEwen commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648947003 Merging this as part of the current batch of fixes... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #12767: [FLINK-18429][DataStream API] Make CheckpointListener.notifyCheckpointAborted(checkpointId) a default method.
flinkbot commented on pull request #12767: URL: https://github.com/apache/flink/pull/12767#issuecomment-648936333 ## CI report: * cf0c7ecf53413b9051bb02138515dce579f533f4 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pnowojski commented on pull request #12736: [FLINK-17800][rocksdb] Ensure total order seek to avoid user misuse
pnowojski commented on pull request #12736: URL: https://github.com/apache/flink/pull/12736#issuecomment-648935901 +1 to merge this to `release-1.11`, just please take an extra care to not destabilise the build again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18116) Manually test E2E performance on Flink 1.11
[ https://issues.apache.org/jira/browse/FLINK-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Li closed FLINK-18116. Resolution: Done I mainly ran the stability test developed by Ali: by simulating online abnormal conditions (such as network interruption, full disk, JM/AM process being killed, TM throwing exception, etc.) to check whether flink operation can be automatically recovered. The test lasted 5 hours, simulated multiple abnormal combination scenarios, flink job can return to normal, and the checkpoint can be created. The test pass > Manually test E2E performance on Flink 1.11 > --- > > Key: FLINK-18116 > URL: https://issues.apache.org/jira/browse/FLINK-18116 > Project: Flink > Issue Type: Sub-task > Components: API / Core, API / DataStream, API / State Processor, > Build System, Client / Job Submission >Affects Versions: 1.11.0 >Reporter: Aihua Li >Assignee: Aihua Li >Priority: Blocker > Labels: release-testing > Fix For: 1.11.0 > > > it's mainly to verify the performance don't less than 1.10 version by > checking the metrics of end-to-end performance test,such as qps,latency . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] StephanEwen commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
StephanEwen commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648925287 +1 from my side This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
flinkbot edited a comment on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648833005 ## CI report: * 71eefff46a38b7b50afb251e5fb338f64134a821 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4010) * f02e6446e1c33e34a613d06ff2af4098d7ec08f7 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4015) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18429) Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
[ https://issues.apache.org/jira/browse/FLINK-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144001#comment-17144001 ] Stephan Ewen commented on FLINK-18429: -- With apologies to [~yunta] for the confusion I caused here. > Add default method for > CheckpointListener.notifyCheckpointAborted(checkpointId) > --- > > Key: FLINK-18429 > URL: https://issues.apache.org/jira/browse/FLINK-18429 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The {{CheckpointListener}} interface is implemented by many users. Adding a > new method {{notifyCheckpointAborted(long)}} to the interface without a > default method breaks many user programs. > We should turn this method into a default method: > - Avoid breaking programs > - It is in practice less relevant for programs to react to checkpoints > being aborted then to being completed. The reason is that on completion you > often want to commit side-effects, while on abortion you frequently do not do > anything, but let the next successful checkpoint commit all changes up to > then. > *Original Confusion* > There was confusion about this originally, going back to a comment by myself > suggesting this should not be a default method, incorrectly thinking of it as > an internal interface: > https://github.com/apache/flink/pull/8693#issuecomment-542834147 > See also clarification email on the mailing list: > {noformat} > About the "notifyCheckpointAborted()": > When I wrote that comment, I was (apparently wrongly) assuming we were > talking about an > internal interface here, because the "abort" signal was originally only > intended to cancel the > async part of state backend checkpoints. > I just realized that this is exposed to users - and I am actually with Thomas > on this one. The > "CheckpointListener" is a very public interface that many users implement. > The fact that it is > tagged "@PublicEvolving" is somehow not aligned with reality. So adding the > method here will > in reality break lots and lots of user programs. > I think also in practice it is much less relevant for user applications to > react to aborted checkpoints. > Since the notifications there can not be relied upon (if there is a task > failure concurrently) users > always have to follow the "newer checkpoint subsumes older checkpoint" > contract, so the abort > method is probably rarely relevant. > This is something we should change, in my opinion. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18429) Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
[ https://issues.apache.org/jira/browse/FLINK-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-18429: - Description: The {{CheckpointListener}} interface is implemented by many users. Adding a new method {{notifyCheckpointAborted(long)}} to the interface without a default method breaks many user programs. We should turn this method into a default method: - Avoid breaking programs - It is in practice less relevant for programs to react to checkpoints being aborted then to being completed. The reason is that on completion you often want to commit side-effects, while on abortion you frequently do not do anything, but let the next successful checkpoint commit all changes up to then. *Original Confusion* There was confusion about this originally, going back to a comment by myself suggesting this should not be a default method, incorrectly thinking of it as an internal interface: https://github.com/apache/flink/pull/8693#issuecomment-542834147 See also clarification email on the mailing list: {noformat} About the "notifyCheckpointAborted()": When I wrote that comment, I was (apparently wrongly) assuming we were talking about an internal interface here, because the "abort" signal was originally only intended to cancel the async part of state backend checkpoints. I just realized that this is exposed to users - and I am actually with Thomas on this one. The "CheckpointListener" is a very public interface that many users implement. The fact that it is tagged "@PublicEvolving" is somehow not aligned with reality. So adding the method here will in reality break lots and lots of user programs. I think also in practice it is much less relevant for user applications to react to aborted checkpoints. Since the notifications there can not be relied upon (if there is a task failure concurrently) users always have to follow the "newer checkpoint subsumes older checkpoint" contract, so the abort method is probably rarely relevant. This is something we should change, in my opinion. {noformat} was: The {{CheckpointListener}} interface is implemented by many users. Adding a new method {{notifyCheckpointAborted(long)}} to the interface without a default method breaks many user programs. We should turn this method into a default method: - Avoid breaking programs - It is in practice less relevant for programs to react to checkpoints being aborted then to being completed. The reason is that on completion you often want to commit side-effects, while on abortion you frequently do not do anything, but let the next successful checkpoint commit all changes up to then. *Original Confusion* There was confusion about this originally, going back to a comment by myself suggesting this should not be a default method, incorrectly thinking of it as an internal interface: https://github.com/apache/flink/pull/8693#issuecomment-542834147 See also clarification email on the mailing list: {noformat} About the "notifyCheckpointAborted()": When I wrote that comment, I was (apparently wrongly) assuming we were talking about an internal interface here, because the "abort" signal was originally only intended to cancel the async part of state backend checkpoints. I just realized that this is exposed to users - and I am actually with Thomas on this one. The "CheckpointListener" is a very public interface that many users implement. The fact that it is tagged "@PublicEvolving" is somehow not aligned with reality. So adding the method here will in reality break lots and lots of user programs. I think also in practice it is much less relevant for user applications to react to aborted checkpoints. Since the notifications there can not be relied upon (if there is a task failure concurrently) users always have to follow the "newer checkpoint subsumes older checkpoint" contract, so the abort method is probably rarely relevant. This is something we should change, in my opinion. {noformat} > Add default method for > CheckpointListener.notifyCheckpointAborted(checkpointId) > --- > > Key: FLINK-18429 > URL: https://issues.apache.org/jira/browse/FLINK-18429 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The {{CheckpointListener}} interface is implemented by many users. Adding a > new method {{notifyCheckpointAborted(long)}} to the interface without a > default method breaks many user programs. > We should turn this method into a default method: > - Avoid breaking programs > - It is in practice less relevant for programs to react to checkpoints > being aborted then to b
[GitHub] [flink] flinkbot commented on pull request #12767: [FLINK-18429][DataStream API] Make CheckpointListener.notifyCheckpointAborted(checkpointId) a default method.
flinkbot commented on pull request #12767: URL: https://github.com/apache/flink/pull/12767#issuecomment-648922022 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit cf0c7ecf53413b9051bb02138515dce579f533f4 (Wed Jun 24 16:20:19 UTC 2020) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18431) Provide default empty implementation of notifyCheckpointAborted
[ https://issues.apache.org/jira/browse/FLINK-18431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski closed FLINK-18431. -- Resolution: Duplicate > Provide default empty implementation of notifyCheckpointAborted > > > Key: FLINK-18431 > URL: https://issues.apache.org/jira/browse/FLINK-18431 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Affects Versions: 1.11.0 >Reporter: Piotr Nowojski >Assignee: Piotr Nowojski >Priority: Blocker > Fix For: 1.11.0 > > > There was some confusion during reviewing of FLINK-8871 and > {{org.apache.flink.runtime.state.CheckpointListener#notifyCheckpointAborted}} > was left without a default implementation, that's braking backward > compatibility. Default empty implementation of this method should be provided. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18431) Provide default empty implementation of notifyCheckpointAborted
Piotr Nowojski created FLINK-18431: -- Summary: Provide default empty implementation of notifyCheckpointAborted Key: FLINK-18431 URL: https://issues.apache.org/jira/browse/FLINK-18431 Project: Flink Issue Type: Improvement Components: API / DataStream Affects Versions: 1.11.0 Reporter: Piotr Nowojski Fix For: 1.11.0 There was some confusion during reviewing of FLINK-8871 and {{org.apache.flink.runtime.state.CheckpointListener#notifyCheckpointAborted}} was left without a default implementation, that's braking backward compatibility. Default empty implementation of this method should be provided. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-18431) Provide default empty implementation of notifyCheckpointAborted
[ https://issues.apache.org/jira/browse/FLINK-18431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski reassigned FLINK-18431: -- Assignee: Piotr Nowojski > Provide default empty implementation of notifyCheckpointAborted > > > Key: FLINK-18431 > URL: https://issues.apache.org/jira/browse/FLINK-18431 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Affects Versions: 1.11.0 >Reporter: Piotr Nowojski >Assignee: Piotr Nowojski >Priority: Blocker > Fix For: 1.11.0 > > > There was some confusion during reviewing of FLINK-8871 and > {{org.apache.flink.runtime.state.CheckpointListener#notifyCheckpointAborted}} > was left without a default implementation, that's braking backward > compatibility. Default empty implementation of this method should be provided. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18429) Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
[ https://issues.apache.org/jira/browse/FLINK-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-18429: --- Labels: pull-request-available (was: ) > Add default method for > CheckpointListener.notifyCheckpointAborted(checkpointId) > --- > > Key: FLINK-18429 > URL: https://issues.apache.org/jira/browse/FLINK-18429 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > The {{CheckpointListener}} interface is implemented by many users. Adding a > new method {{notifyCheckpointAborted(long)}} to the interface without a > default method breaks many user programs. > We should turn this method into a default method: > - Avoid breaking programs > - It is in practice less relevant for programs to react to checkpoints > being aborted then to being completed. The reason is that on completion you > often want to commit side-effects, while on abortion you frequently do not do > anything, but let the next successful checkpoint commit all changes up to > then. > *Original Confusion* > There was confusion about this originally, going back to a comment by myself > suggesting this should not be a default method, incorrectly thinking of it as > an internal interface: > https://github.com/apache/flink/pull/8693#issuecomment-542834147 > See also clarification email on the mailing list: > {noformat} > About the "notifyCheckpointAborted()": > When I wrote that comment, I was (apparently wrongly) assuming we were > talking about an internal interface here, because the "abort" signal was > originally only intended to cancel the async part of state backend > checkpoints. > I just realized that this is exposed to users - and I am actually with Thomas > on this one. The "CheckpointListener" is a very public interface that many > users implement. The fact that it is tagged "@PublicEvolving" is somehow not > aligned with reality. So adding the method here will in reality break lots > and lots of user programs. > I think also in practice it is much less relevant for user applications to > react to aborted checkpoints. Since the notifications there can not be relied > upon (if there is a task failure concurrently) users always have to follow > the "newer checkpoint subsumes older checkpoint" contract, so the abort > method is probably rarely relevant. > This is something we should change, in my opinion. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] StephanEwen opened a new pull request #12767: [FLINK-18429][DataStream API] Make CheckpointListener.notifyCheckpointAborted(checkpointId) a default method.
StephanEwen opened a new pull request #12767: URL: https://github.com/apache/flink/pull/12767 ## Purpose / Brief change log This PR prevents breaking existing user programs by making the newly added method `CheckpointListener.notifyCheckpointAborted(checkpointId)` a default method. In addition, this PR upgrades the stability of `CheckpointListener` and `CheckpointedFunction` to `@Public` to reflect that we should not break them and make sure this is caught by our tooling. ## Original Confusion about (Not) Having a Default Method *(Copying the description from the JIRA issue here)* There was confusion about this originally, going back to a comment by myself suggesting this should not be a default method, incorrectly thinking of it as an internal interface: https://github.com/apache/flink/pull/8693#issuecomment-542834147 See clarification email on the mailing list: ``` About the "notifyCheckpointAborted()": When I wrote that comment, I was (apparently wrongly) assuming we were talking about an internal interface here, because the "abort" signal was originally only intended to cancel the async part of state backend checkpoints. I just realized that this is exposed to users - and I am actually with Thomas on this one. he "CheckpointListener" is a very public interface that many users implement. The fact that it is tagged "@PublicEvolving" is somehow not aligned with reality. So adding the method here will in reality break lots and lots of user programs. I think also in practice it is much less relevant for user applications to react to aborted checkpoints. Since the notifications there can not be relied upon (if there is a task failure concurrently) users always have to follow the "newer checkpoint subsumes older checkpoint" contract, so the abort method is probably rarely relevant. ``` ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **yes** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **not applicable** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-18430) Upgrade stability to @Public for CheckpointedFunction and CheckpointListener
Stephan Ewen created FLINK-18430: Summary: Upgrade stability to @Public for CheckpointedFunction and CheckpointListener Key: FLINK-18430 URL: https://issues.apache.org/jira/browse/FLINK-18430 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.11.0 The two interfaces {{CheckpointedFunction}} and {{CheckpointListener}} are used by many users, but are still (for years now) marked as {{@PublicEvolving}}. I think this is not correct. They are very core to the DataStream API and are used widely and should be treated as {{@Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
flinkbot edited a comment on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648833005 ## CI report: * 71eefff46a38b7b50afb251e5fb338f64134a821 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4010) * f02e6446e1c33e34a613d06ff2af4098d7ec08f7 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12758: [FLINK-18386][docs-zh] Translate "Print SQL Connector" page into Chinese
flinkbot edited a comment on pull request #12758: URL: https://github.com/apache/flink/pull/12758#issuecomment-648346882 ## CI report: * ca0cfac41e49d373d70c9e62bf6edc6ba9337ac8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4011) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12765: [FLINK-18417][table] Support List as a conversion class for ARRAY
flinkbot edited a comment on pull request #12765: URL: https://github.com/apache/flink/pull/12765#issuecomment-648778671 ## CI report: * 07e0fd5e74dda6415cab5445c2ec41c1ee3375c2 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4004) * 29e7c8685d8218fa538e14772fc32d945cf1c56a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4009) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XBaith commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
XBaith commented on a change in pull request #12727: URL: https://github.com/apache/flink/pull/12727#discussion_r444988317 ## File path: docs/learn-flink/fault_tolerance.zh.md ## @@ -29,180 +29,137 @@ under the License. ## State Backends -The keyed state managed by Flink is a sort of sharded, key/value store, and the working copy of each -item of keyed state is kept somewhere local to the taskmanager responsible for that key. Operator -state is also local to the machine(s) that need(s) it. Flink periodically takes persistent snapshots -of all the state and copies these snapshots somewhere more durable, such as a distributed file -system. +由 Flink 管理的 keyed state 是一种分片的键/值存储,每个 keyed state 的工作副本都保存在负责该键的 taskmanager 本地的某个位置。Operator state 也存在机器节点本地。Flink 定期获取所有状态的快照,并将这些快照复制到持久化的位置,例如分布式文件系统。 -In the event of the failure, Flink can restore the complete state of your application and resume -processing as though nothing had gone wrong. +如果发生故障,Flink 可以恢复应用程序的完整状态并继续处理,就如同没有出现过异常。 -This state that Flink manages is stored in a _state backend_. Two implementations of state backends -are available -- one based on RocksDB, an embedded key/value store that keeps its working state on -disk, and another heap-based state backend that keeps its working state in memory, on the Java heap. -This heap-based state backend comes in two flavors: the FsStateBackend that persists its state -snapshots to a distributed file system, and the MemoryStateBackend that uses the JobManager's heap. +Flink 管理的状态存储在 _state backend_ 中。Flink 有两种 state backend 的实现 -- 一种基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的,另一种基于堆的 state backend,将其工作状态保存在 Java 的堆内存中。这种基于堆的 state backend 有两种类型:FsStateBackend,将其状态快照持久化到分布式文件系统;MemoryStateBackend,它使用 JobManager 的堆保存状态快照。 - Name + 名称 Working State - State Backup - Snapshotting + 状态备份 + 快照 RocksDBStateBackend - Local disk (tmp dir) - Distributed file system - Full / Incremental + 本地磁盘(tmp dir) + 分布式文件系统 + 全量 / 增量 - Supports state larger than available memory - Rule of thumb: 10x slower than heap-based backends + 支持大于内存大小的状态 + 经验法则:比基于堆的后端慢10倍 FsStateBackend JVM Heap - Distributed file system - Full + 分布式文件系统 + 全量 - Fast, requires large heap - Subject to GC + 快速,需要大的堆内存 + 受限制于 GC MemoryStateBackend JVM Heap JobManager JVM Heap - Full + 全量 - Good for testing and experimentation with small state (locally) + 适用于小状态(本地)的测试和实验 -When working with state kept in a heap-based state backend, accesses and updates involve reading and -writing objects on the heap. But for objects kept in the `RocksDBStateBackend`, accesses and updates -involve serialization and deserialization, and so are much more expensive. But the amount of state -you can have with RocksDB is limited only by the size of the local disk. Note also that only the -`RocksDBStateBackend` is able to do incremental snapshotting, which is a significant benefit for -applications with large amounts of slowly changing state. +当使用基于堆的 state backend 保存状态时,访问和更新涉及在堆上读写对象。但是对于保存在 `RocksDBStateBackend` 中的对象,访问和更新涉及序列化和反序列化,所以会有更大的开销。但 RocksDB 的状态量仅受本地磁盘大小的限制。还要注意,只有 `RocksDBStateBackend` 能够进行增量快照,这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。 -All of these state backends are able to do asynchronous snapshotting, meaning that they can take a -snapshot without impeding the ongoing stream processing. +所有这些 state backends 都能够异步执行快照,这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。 {% top %} -## State Snapshots +## 状态快照 -### Definitions +### 定义 -* _Snapshot_ -- a generic term referring to a global, consistent image of the state of a Flink job. - A snapshot includes a pointer into each of the data sources (e.g., an offset into a file or Kafka - partition), as well as a copy of the state from each of the job's stateful operators that resulted - from having processed all of the events up to those positions in the sources. -* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of being able to recover - from faults. Checkpoints can be incremental, and are optimized for being restored quickly. -* _Externalized Checkpoint_ -- normally checkpoints are not intended to be manipulated by users. - Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) while a job is - running, and deletes them when a job is cancelled. But you can configure them to be retained - instead, in which case you can manually resume from them. -* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for some operational - purpose, such as a stateful redeploy/upgrade/rescaling ope
[GitHub] [flink] austince commented on pull request #12056: [FLINK-17502] [flink-connector-rabbitmq] RMQSource refactor
austince commented on pull request #12056: URL: https://github.com/apache/flink/pull/12056#issuecomment-648908629 @senegalo feel better! I don't think there's any rush for our stuff since it's 1.11 release season as well! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18427) Job failed under java 11
[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143951#comment-17143951 ] Stephan Ewen commented on FLINK-18427: -- Thanks for reporting this. I think the reason is that under Java 11, Netty will allocate memory from the pool of Java Direct Memory and is affected by the MaxDirectMemory limit, Under Java 8, it allocates native memory and is not affected by that setting. For Flink 1.10.0 you probably need an increased values for "Framework Offheap Memory" in higher parallelism cases. This documentation page has more details: https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#configure-off-heap-memory-direct-or-native Flink 1.11.0 should reduce the Netty memory usage significantly, by directly reading into Flink's managed memory on the receiver side. [~zjwang], [~pnowojski] would know more details about this. > Job failed under java 11 > > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream >Reporter: Zhang Hao >Priority: Critical > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. > > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at
[GitHub] [flink] becketqin commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
becketqin commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648897424 @twalthr @dawidwys Thanks for the review. I'll merge the patch after the CI tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-18429) Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
Stephan Ewen created FLINK-18429: Summary: Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId) Key: FLINK-18429 URL: https://issues.apache.org/jira/browse/FLINK-18429 Project: Flink Issue Type: Bug Components: API / DataStream Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.11.0 The {{CheckpointListener}} interface is implemented by many users. Adding a new method {{notifyCheckpointAborted(long)}} to the interface without a default method breaks many user programs. We should turn this method into a default method: - Avoid breaking programs - It is in practice less relevant for programs to react to checkpoints being aborted then to being completed. The reason is that on completion you often want to commit side-effects, while on abortion you frequently do not do anything, but let the next successful checkpoint commit all changes up to then. *Original Confusion* There was confusion about this originally, going back to a comment by myself suggesting this should not be a default method, incorrectly thinking of it as an internal interface: https://github.com/apache/flink/pull/8693#issuecomment-542834147 See also clarification email on the mailing list: {noformat} About the "notifyCheckpointAborted()": When I wrote that comment, I was (apparently wrongly) assuming we were talking about an internal interface here, because the "abort" signal was originally only intended to cancel the async part of state backend checkpoints. I just realized that this is exposed to users - and I am actually with Thomas on this one. The "CheckpointListener" is a very public interface that many users implement. The fact that it is tagged "@PublicEvolving" is somehow not aligned with reality. So adding the method here will in reality break lots and lots of user programs. I think also in practice it is much less relevant for user applications to react to aborted checkpoints. Since the notifications there can not be relied upon (if there is a task failure concurrently) users always have to follow the "newer checkpoint subsumes older checkpoint" contract, so the abort method is probably rarely relevant. This is something we should change, in my opinion. {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] becketqin commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
becketqin commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648890887 @twalthr I think the clearest naming is either `StreamExecutionEnvironment#dataStreamFromSource()` or `DataStream#fromSource()`. But given the current status, I agree that `StreamExecutionEnvironment#fromSource()` is better than just `source()`. At least this would give a consistent style after we deprecate `readFile()`, `addSource()` and `createInput()`. I'll update the patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] StephanEwen commented on pull request #12736: [FLINK-17800][rocksdb] Ensure total order seek to avoid user misuse
StephanEwen commented on pull request #12736: URL: https://github.com/apache/flink/pull/12736#issuecomment-64155 Looks fine from my side, thanks. +1 to merge into `master`. About merging to `release-1.11` - this is up to @pnowojski and @zhijiangW as release managers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
flinkbot edited a comment on pull request #12699: URL: https://github.com/apache/flink/pull/12699#issuecomment-645525540 ## CI report: * d182fc4331b08fc00e5d7ad339e1f3ae79d6f7bc UNKNOWN * 2a876970e69c0f14f31125441e92432aab3ac817 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3778) * 768b439c76d6e261294a63b5e5d10738419054c1 UNKNOWN * a5ba950dc1eba6a27cfbb521ba42666fbb44c5e2 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4014) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dawidwys commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
dawidwys commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648871161 +1 from my side Personally I am fine with either `fromSource`/`addSource`/`source`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12758: [FLINK-18386][docs-zh] Translate "Print SQL Connector" page into Chinese
flinkbot edited a comment on pull request #12758: URL: https://github.com/apache/flink/pull/12758#issuecomment-648346882 ## CI report: * e14e00b7039fbac4800a8ef83d04e02bd621824c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3979) * ca0cfac41e49d373d70c9e62bf6edc6ba9337ac8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4011) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
flinkbot edited a comment on pull request #12699: URL: https://github.com/apache/flink/pull/12699#issuecomment-645525540 ## CI report: * d182fc4331b08fc00e5d7ad339e1f3ae79d6f7bc UNKNOWN * 2a876970e69c0f14f31125441e92432aab3ac817 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3778) * 768b439c76d6e261294a63b5e5d10738419054c1 UNKNOWN * a5ba950dc1eba6a27cfbb521ba42666fbb44c5e2 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr edited a comment on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
twalthr edited a comment on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648860657 Why don't we call this method `fromSource`? We have `fromElements`, `fromCollection`, `fromParallelCollection`. It is confusing that we also have `addSource`, `readFile`, and `createInput`. Now we add another flavor that is just called `source`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
twalthr commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648860657 Why don't we call this method `fromSource`? We have `fromElements`, `fromCollection`, `fromParallelCollection`. It is confusing that we also have `addSource`, `readFile`, and `createInput`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18426) Incompatible deprecated key type for registration cluster options
[ https://issues.apache.org/jira/browse/FLINK-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-18426. - Resolution: Fixed Fixed via 1.12.0: ca534016f90efd348bc9596671f067f99ff3ae19 1.11.1: 62c7265522fcf1b708b4906d5d74e40188a80f28 > Incompatible deprecated key type for registration cluster options > -- > > Key: FLINK-18426 > URL: https://issues.apache.org/jira/browse/FLINK-18426 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / Coordination >Affects Versions: 1.11.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available > Fix For: 1.12.0, 1.11.1 > > > With FLINK-15827 we deprecated unused {{TaskManagerOptions}}. As part of this > deprecation, we added them as deprecated keys for a couple of > {{ClusterOptions}}. The problem is that the deprecated keys are of type > {{Duration}} whereas the valid options are of type {{Long}}. Hence, the > system will fail if a deprecated config option has been configured because it > cannot be parsed as a long. > In order to solve the problem, I propose to remove the deprecated keys from > the new {{ClusterOptions}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] tillrohrmann closed pull request #12763: [FLINK-18426] Remove incompatible deprecated keys from ClusterOptions
tillrohrmann closed pull request #12763: URL: https://github.com/apache/flink/pull/12763 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on pull request #12763: [FLINK-18426] Remove incompatible deprecated keys from ClusterOptions
tillrohrmann commented on pull request #12763: URL: https://github.com/apache/flink/pull/12763#issuecomment-648854167 Thanks for the review @zentol. The failing test case is unrelated. Merging this PR now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #12764: [FLINK-18423][docs] Fix Prefer tag in document "Detecting Patterns" age of "Streaming Concepts"
RocMarshal commented on pull request #12764: URL: https://github.com/apache/flink/pull/12764#issuecomment-648846910 Hi, @wuchong I have updated 'Prefer tag' in documentation "Detecting Patterns" page of "Streaming Concepts" according to [Prefer Reminder](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-Prefer-link-tag-in-documentation-td42362.html).The markdown file location is: flink/docs/dev/table/streaming/match_recognize.md When I translate this page into Chinese, I will update the reference links accordingly(https://issues.apache.org/jira/browse/FLINK-16087). And could you help me to review this PR(https://github.com/apache/flink/pull/12764)? Thank you. Best, Roc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
flinkbot edited a comment on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648833005 ## CI report: * 71eefff46a38b7b50afb251e5fb338f64134a821 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4010) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12758: [FLINK-18386][docs-zh] Translate "Print SQL Connector" page into Chinese
flinkbot edited a comment on pull request #12758: URL: https://github.com/apache/flink/pull/12758#issuecomment-648346882 ## CI report: * e14e00b7039fbac4800a8ef83d04e02bd621824c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3979) * ca0cfac41e49d373d70c9e62bf6edc6ba9337ac8 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12765: [FLINK-18417][table] Support List as a conversion class for ARRAY
flinkbot edited a comment on pull request #12765: URL: https://github.com/apache/flink/pull/12765#issuecomment-648778671 ## CI report: * 07e0fd5e74dda6415cab5445c2ec41c1ee3375c2 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4004) * 29e7c8685d8218fa538e14772fc32d945cf1c56a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4009) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12761: [FLINK-17465][doc-zh] Update Chinese translations for memory configuration documents.
flinkbot edited a comment on pull request #12761: URL: https://github.com/apache/flink/pull/12761#issuecomment-648644461 ## CI report: * 5231e75f7bbc13756b825a82ec7394521d75d5ad Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4003) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr commented on a change in pull request #12542: [FLINK-18168][table-runtime-blink] return a copy of reuseArray in ObjectArrayConverter.toBinaryArray()
twalthr commented on a change in pull request #12542: URL: https://github.com/apache/flink/pull/12542#discussion_r444918058 ## File path: flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/data/conversion/ArrayObjectArrayConverter.java ## @@ -154,7 +154,7 @@ void writeElement(int pos, E element) { BinaryArrayData completeWriter() { reuseWriter.complete(); - return reuseArray; + return reuseArray.copy(); Review comment: I moved the `copy()` to the outer level in `toBinaryArrayData` because this method is shared with `MapMapConverter` which would perform a copy twice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese
flinkbot edited a comment on pull request #12727: URL: https://github.com/apache/flink/pull/12727#issuecomment-647010602 ## CI report: * 89242d0d76ab11a8d36234835ff121e16c92b17e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4002) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12699: [FLINK-18349][docs] Add release notes for Flink 1.11
flinkbot edited a comment on pull request #12699: URL: https://github.com/apache/flink/pull/12699#issuecomment-645525540 ## CI report: * d182fc4331b08fc00e5d7ad339e1f3ae79d6f7bc UNKNOWN * 2a876970e69c0f14f31125441e92432aab3ac817 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3778) * 768b439c76d6e261294a63b5e5d10738419054c1 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #12766: [FLINK-18428][API/DataStream] Rename StreamExecutionEnvironment#continuousSource() to StreamExecutionEnvironment#source().
flinkbot commented on pull request #12766: URL: https://github.com/apache/flink/pull/12766#issuecomment-648833005 ## CI report: * 71eefff46a38b7b50afb251e5fb338f64134a821 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #12765: [FLINK-18417][table] Support List as a conversion class for ARRAY
flinkbot edited a comment on pull request #12765: URL: https://github.com/apache/flink/pull/12765#issuecomment-648778671 ## CI report: * 07e0fd5e74dda6415cab5445c2ec41c1ee3375c2 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4004) * 29e7c8685d8218fa538e14772fc32d945cf1c56a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org