Re: [VOTE] Release 1.8.0, release candidate #4
@Yu discovered this issue, which IMO is probably a blocker for the release: https://issues.apache.org/jira/browse/FLINK-12064. The bug is a regression caused by previous state backend refactorings, which can result in incorrect representation of the schema of serialized keys in state because the wrong key serializer instance is being snapshotted. There is already a PR to fix this, I'll try to review and merge it over the weekend. On Fri, Mar 29, 2019 at 7:13 AM Richard Deurwaarder wrote: > -1 (non-binding) > > - Ran integration tests locally (1000+) of our flink job, all succeeded. > - Attempted to run job on hadoop, failed. It failed because we have a > firewall in place and we cannot set the rest port to a specific port/port > range. > Unless I am mistaken, it seems like FLINK-11081 broke the possibility of > setting a REST port when running on yarn ( > > https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102 > ) > Code-wise it seems rather straightforward to fix but I am unsure about the > reason why this is hard-coded to 0 and what the impact would be. > > It would benefit us greatly if a fix for this could make it to 1.8.0. > > Regards, > > Richard > > On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai > wrote: > > > +1 (binding) > > > > Functional checks: > > > > - Built Flink from source (`mvn clean verify`) locally, with success > > - Ran end-to-end tests locally for 5 times in a loop, no attempts failed > > (Hadoop 2.8.4, Scala 2.12) > > - Manually tested state schema evolution for POJO. Besides the tests that > > @Congxian already did, additionally tested evolution cases with POJO > > subclasses + non-registered POJOs. > > - Manually tested migration of Scala stateful jobs that use case classes > / > > Scala collections as state types, performing the migration across Scala > > 2.11 to Scala 2.12. > > - Reviewed release announcement PR > > > > Misc / legal checks: > > > > - checked checksums and signatures > > - No binaries in source distribution > > - Staging area does not seem to have any missing artifacts > > > > Cheers, > > Gordon > > > > On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai > > > wrote: > > > > > @Shaoxuan > > > > > > The drop in the serializerAvro benchmark, as explained earlier in > > previous > > > voting threads of earlier RCs, was due to a slower job initialization > > phase > > > caused by slower deserialization of the AvroSerializer. > > > Piotr also pointed out that after the number of records was increased > in > > > the serializer benchmarks, this drop was no longer observable before / > > > after the changes in mid February. > > > IMO, this is not critical as it does not affect the per-record > > performance > > > / throughput, and therefore should not block this release. > > > > > > On Thu, Mar 28, 2019 at 1:08 AM Aljoscha Krettek < > aljos...@fastmail.com> > > > wrote: > > > > > >> By now, I'm reasonably sure that the test instabilities on the > > end-to-end > > >> test are only instabilities. I pushed changes to increase timeouts to > > make > > >> the tests more stable. As in any project, there will always be bugs > but > > I > > >> think we could release this RC4 and be reasonably sure that it works > > well. > > >> > > >> Now, we only need to have the required number of PMC votes. > > >> > > >> On Wed, Mar 27, 2019, at 07:22, Congxian Qiu wrote: > > >> > +1 (non-binding) > > >> > > > >> > • checked signature and checksum ok > > >> > • mvn clean package -DskipTests ok > > >> > • Run job on yarn ok > > >> > • Test state migration with POJO type (both heap and rocksdb) ok > > >> > • - 1.6 -> 1.8 > > >> > • - 1.7 -> 1.8 > > >> > • - 1.8 -> 1.8 > > >> > > > >> > > > >> > Best, Congxian > > >> > On Mar 27, 2019, 10:26 +0800, vino yang , > > wrote: > > >> > > +1 (non-binding) > > >> > > > > >> > > - checked JIRA release note > > >> > > - ran "mvn package -DskipTests" > > >> > > - checked signature and checksum > > >> > > - started a cluster locally and ran some examples in binary > > >> > > - checked web site announcement's PR > > >> > > > > >> > > Best, > > >> > > Vino > > >> > > > > >> > > > > >> > > Xiaowei Jiang 于2019年3月26日周二 下午8:20写道: > > >> > > > > >> > > > +1 (non-binding) > > >> > > > > > >> > > > - checked checksums and GPG files > > >> > > > - build from source successfully- run end-to-end precommit tests > > >> > > > successfully- run end-to-end nightly tests successfully > > >> > > > Xiaowei > > >> > > > On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li < > > >> car...@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > +1 (non-binding) > > >> > > > > > >> > > > - Checked release notes: OK > > >> > > > - Checked sums and signatures: OK > > >> > > > - Source release > > >> > > > - contains no binaries: OK > > >> > > > - contains no 1.8-SNAPSHOT references: OK > > >> > > > - build from source: OK (8u101) > > >> > > > - mvn clean verify: OK (8u101) > > >> > > > - Binary
[jira] [Created] (FLINK-12070) Make blocking result partitions consumable multiple times
Till Rohrmann created FLINK-12070: - Summary: Make blocking result partitions consumable multiple times Key: FLINK-12070 URL: https://issues.apache.org/jira/browse/FLINK-12070 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: Till Rohrmann In order to avoid writing produced results multiple times for multiple consumers and in order to speed up batch recoveries, we should make the blocking result partitions to be consumable multiple times. At the moment a blocking result partition will be released once the consumers has processed all data. Instead the result partition should be released once the next blocking result has been produced and all consumers of a blocking result partition have terminated. Moreover, blocking results should not hold on slot resources like network buffers or memory as it is currently the case with {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12069) Add proper lifecycle management for intermediate result partitions
Till Rohrmann created FLINK-12069: - Summary: Add proper lifecycle management for intermediate result partitions Key: FLINK-12069 URL: https://issues.apache.org/jira/browse/FLINK-12069 Project: Flink Issue Type: Improvement Components: Runtime / Coordination, Runtime / Network Affects Versions: 1.8.0, 1.9.0 Reporter: Till Rohrmann In order to properly execute batch jobs, we should make the lifecycle management of intermediate result partitions the responsibility of the {{JobMaster}}/{{Scheduler}} component. The {{Scheduler}} knows best when an intermediate result partition is no longer needed and, thus, can be freed. So for example, a blocking intermediate result should only be released after all subsequent blocking intermediate results have been completed in order to speed up potential failovers. Moreover, having explicit control over intermediate result partitions, could also enable use cases like result partition sharing between jobs and even across clusters (by simply not releasing the result partitions). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12068) Backtrack fail over regions if intermediate results are unavailable
Till Rohrmann created FLINK-12068: - Summary: Backtrack fail over regions if intermediate results are unavailable Key: FLINK-12068 URL: https://issues.apache.org/jira/browse/FLINK-12068 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Till Rohrmann The batch failover strategy needs to be able to backtrack fail over regions if an intermediate result is unavailable. Either by explicitly checking whether the intermediate result partition is available or via a special exception indicating that a result partition is no longer available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12067) Refactor the constructor of NetworkEnvironment
zhijiang created FLINK-12067: Summary: Refactor the constructor of NetworkEnvironment Key: FLINK-12067 URL: https://issues.apache.org/jira/browse/FLINK-12067 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: zhijiang Assignee: zhijiang The constructor of {{NetworkEnvironment}} could be refactored to only contain {{NetworkEnvironmentConfiguration}}, the other related components such as {{TaskEventDispatcher}}, {{ResultPartitionManager}}, {{NetworkBufferPool}} could be created internally. We also refactor the process of generating {{NetworkEnvironmentConfiguration}} in {{TaskManagerServiceConfiguration}} to add {{numNetworkBuffers}} instead of previous {{networkBufFraction}}, {{networkBufMin}}, {{networkBufMax}}. Further we introduce the {{NetworkEnvironmentConfigurationBuilder}} for creating {{NetworkEnvironmentConfiguration}} easily especially for tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12066) Remove StateSerializerProvider field in keyed state backend
Yu Li created FLINK-12066: - Summary: Remove StateSerializerProvider field in keyed state backend Key: FLINK-12066 URL: https://issues.apache.org/jira/browse/FLINK-12066 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Yu Li Assignee: Yu Li Fix For: 1.9.0 As mentioned in [PR review of FLINK-10043|https://github.com/apache/flink/pull/7674#discussion_r257630962] with Stefan and offline discussion with Gordon, after the refactoring work serializer passed to {{RocksDBKeyedStateBackend}} constructor is a final one, thus the {{StateSerializerProvider}} field is no longer needed. For {{HeapKeyedStateBackend}}, the only thing stops us to pass a final serializer is the circle dependency between the backend and {{HeapRestoreOperation}}, and we aim to decouple them by introducing a new {{HeapInternalKeyContext}} as the bridge. More details please refer to the coming PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12065) E2E tests fail due to illegal-access warning on Java 9
Chesnay Schepler created FLINK-12065: Summary: E2E tests fail due to illegal-access warning on Java 9 Key: FLINK-12065 URL: https://issues.apache.org/jira/browse/FLINK-12065 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.9.0 Reporter: Chesnay Schepler When accessing inaccessible fields via reflection a warning is printed on Java 9 like below: {code} WARNING: An illegal reflective access operation has occurred WARNING: All illegal access operations will be denied in a future release WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.flink.core.memory.HybridMemorySegment (file:/home/travis/build/zentol/flink/flink-dist/target/flink-1.9-SNAPSHOT-bin/flink-1.9-SNAPSHOT/lib/flink-dist_2.11-1.9-SNAPSHOT.jar) to field java.nio.Buffer.address {code} These are printed into the .out file of the processes, and cause e2e tests to fail since we check for empty .out files. >From what I've gathered we cannot disable these warnings, so we'll have to >adapt the check to ignore these. We can't just fix these accesses since they also occur in libraries (like akka's netty) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[Proposal] Shuffle resources lifecycle management
Hi All, While working on pluggable shuffle architecture, looking into interactive programming and fine-grained recovery efforts, we released that lifecycle management of intermediate result partitions needs more detailed discussion to enable envisioned use cases. Here I prepared a design document to address this concern. The document proposes certain extensions to FLIP-31 (Design of Pluggable Shuffle Service): https://docs.google.com/document/d/13vAJJxfRXAwI4MtO8dux8hHnNMw2Biu5XRrb_hvGehA Looking forward to your feedback. Thanks, Andrey
[jira] [Created] (FLINK-12064) RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore
Yu Li created FLINK-12064: - Summary: RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore Key: FLINK-12064 URL: https://issues.apache.org/jira/browse/FLINK-12064 Project: Flink Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Fix For: 1.8.0 As titled, in current {{RocksDBKeyedStateBackend}} we use {{keySerializer}} rather than {{keySerializerProvider.currentSchemaSerializer()}}, which is incorrect. The issue is not revealed in existing UT since current cases didn't check snapshot after state schema migration. This is a regression issue caused by the FLINK-10043 refactoring work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Bump up the shaded dependency version
Why do you want to bump the shaded-jackson dependency? On 28/03/2019 17:56, mayo zhang wrote: Hi, all Is there some way to bump up the version of shaded jars like flink-shaded-jackson? Or is there someone can deal with this Best, Zhang
[jira] [Created] (FLINK-12063) Remove DateTimeUtils and fix datetime function problems in Blink planner
Jark Wu created FLINK-12063: --- Summary: Remove DateTimeUtils and fix datetime function problems in Blink planner Key: FLINK-12063 URL: https://issues.apache.org/jira/browse/FLINK-12063 Project: Flink Issue Type: New Feature Components: Table SQL / Planner Reporter: Jark Wu The root cause might be similar to FLINK-11935 which is the casting problem. For example: {code:sql} CAST('1500-04-30 12:00:00' AS TIMESTAMP) {code} but result is {code} Optimized : 1500-04-30 12:00:00 Expected :1500-04-30 12:00:00.000 Actual :1500-04-20 12:00:00.000 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12062) Introduce bundle operator to streaming table runtime
Kurt Young created FLINK-12062: -- Summary: Introduce bundle operator to streaming table runtime Key: FLINK-12062 URL: https://issues.apache.org/jira/browse/FLINK-12062 Project: Flink Issue Type: Improvement Components: Table SQL / Runtime Reporter: Kurt Young Bundle operator will try to save incoming records in a key-value map. Once bundler triggers, the bundle function will be invoked. All buffered data will be passed in, and one can do some optimizations based on this. One useful scenario for bundle operator is "Group Aggregate". We can organize the bundle data with grouping key. Once bundle triggers, we can first pre aggregate all data belongs to same key in memory, then we only have to operate with state once for each key. This will save lots of cost and have better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)