Re: [VOTE] Release 1.8.0, release candidate #4

2019-03-29 Thread Tzu-Li (Gordon) Tai
@Yu discovered this issue, which IMO is probably a blocker for the
release:  https://issues.apache.org/jira/browse/FLINK-12064.
The bug is a regression caused by previous state backend refactorings,
which can result in incorrect representation of the schema of serialized
keys in state because the wrong key serializer instance is being
snapshotted.

There is already a PR to fix this, I'll try to review and merge it over the
weekend.

On Fri, Mar 29, 2019 at 7:13 AM Richard Deurwaarder  wrote:

> -1 (non-binding)
>
> - Ran integration tests locally (1000+) of our flink job, all succeeded.
> - Attempted to run job on hadoop, failed. It failed because we have a
> firewall in place and we cannot set the rest port to a specific port/port
> range.
> Unless I am mistaken, it seems like FLINK-11081 broke the possibility of
> setting a REST port when running on yarn (
>
> https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102
> )
> Code-wise it seems rather straightforward to fix but I am unsure about the
> reason why this is hard-coded to 0 and what the impact would be.
>
> It would benefit us greatly if a fix for this could make it to 1.8.0.
>
> Regards,
>
> Richard
>
> On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > +1 (binding)
> >
> > Functional checks:
> >
> > - Built Flink from source (`mvn clean verify`) locally, with success
> > - Ran end-to-end tests locally for 5 times in a loop, no attempts failed
> > (Hadoop 2.8.4, Scala 2.12)
> > - Manually tested state schema evolution for POJO. Besides the tests that
> > @Congxian already did, additionally tested evolution cases with POJO
> > subclasses + non-registered POJOs.
> > - Manually tested migration of Scala stateful jobs that use case classes
> /
> > Scala collections as state types, performing the migration across Scala
> > 2.11 to Scala 2.12.
> > - Reviewed release announcement PR
> >
> > Misc / legal checks:
> >
> > - checked checksums and signatures
> > - No binaries in source distribution
> > - Staging area does not seem to have any missing artifacts
> >
> > Cheers,
> > Gordon
> >
> > On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > @Shaoxuan
> > >
> > > The drop in the serializerAvro benchmark, as explained earlier in
> > previous
> > > voting threads of earlier RCs, was due to a slower job initialization
> > phase
> > > caused by slower deserialization of the AvroSerializer.
> > > Piotr also pointed out that after the number of records was increased
> in
> > > the serializer benchmarks, this drop was no longer observable before /
> > > after the changes in mid February.
> > > IMO, this is not critical as it does not affect the per-record
> > performance
> > > / throughput, and therefore should not block this release.
> > >
> > > On Thu, Mar 28, 2019 at 1:08 AM Aljoscha Krettek <
> aljos...@fastmail.com>
> > > wrote:
> > >
> > >> By now, I'm reasonably sure that the test instabilities on the
> > end-to-end
> > >> test are only instabilities. I pushed changes to increase timeouts to
> > make
> > >> the tests more stable. As in any project, there will always be bugs
> but
> > I
> > >> think we could release this RC4 and be reasonably sure that it works
> > well.
> > >>
> > >> Now, we only need to have the required number of PMC votes.
> > >>
> > >> On Wed, Mar 27, 2019, at 07:22, Congxian Qiu wrote:
> > >> > +1 (non-binding)
> > >> >
> > >> > • checked signature and checksum  ok
> > >> > • mvn clean package -DskipTests ok
> > >> > • Run job on yarn ok
> > >> > • Test state migration with POJO type (both heap and rocksdb) ok
> > >> > • - 1.6 -> 1.8
> > >> > • - 1.7 -> 1.8
> > >> > • - 1.8 -> 1.8
> > >> >
> > >> >
> > >> > Best, Congxian
> > >> > On Mar 27, 2019, 10:26 +0800, vino yang ,
> > wrote:
> > >> > > +1 (non-binding)
> > >> > >
> > >> > > - checked JIRA release note
> > >> > > - ran "mvn package -DskipTests"
> > >> > > - checked signature and checksum
> > >> > > - started a cluster locally and ran some examples in binary
> > >> > > - checked web site announcement's PR
> > >> > >
> > >> > > Best,
> > >> > > Vino
> > >> > >
> > >> > >
> > >> > > Xiaowei Jiang  于2019年3月26日周二 下午8:20写道:
> > >> > >
> > >> > > > +1 (non-binding)
> > >> > > >
> > >> > > > - checked checksums and GPG files
> > >> > > > - build from source successfully- run end-to-end precommit tests
> > >> > > > successfully- run end-to-end nightly tests successfully
> > >> > > > Xiaowei
> > >> > > > On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li <
> > >> car...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > +1 (non-binding)
> > >> > > >
> > >> > > > - Checked release notes: OK
> > >> > > > - Checked sums and signatures: OK
> > >> > > > - Source release
> > >> > > > - contains no binaries: OK
> > >> > > > - contains no 1.8-SNAPSHOT references: OK
> > >> > > > - build from source: OK (8u101)
> > >> > > > - mvn clean verify: OK (8u101)
> > >> > > > - Binary 

[jira] [Created] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-03-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12070:
-

 Summary: Make blocking result partitions consumable multiple times
 Key: FLINK-12070
 URL: https://issues.apache.org/jira/browse/FLINK-12070
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Till Rohrmann


In order to avoid writing produced results multiple times for multiple 
consumers and in order to speed up batch recoveries, we should make the 
blocking result partitions to be consumable multiple times. At the moment a 
blocking result partition will be released once the consumers has processed all 
data. Instead the result partition should be released once the next blocking 
result has been produced and all consumers of a blocking result partition have 
terminated. Moreover, blocking results should not hold on slot resources like 
network buffers or memory as it is currently the case with 
{{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12069) Add proper lifecycle management for intermediate result partitions

2019-03-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12069:
-

 Summary: Add proper lifecycle management for intermediate result 
partitions
 Key: FLINK-12069
 URL: https://issues.apache.org/jira/browse/FLINK-12069
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination, Runtime / Network
Affects Versions: 1.8.0, 1.9.0
Reporter: Till Rohrmann


In order to properly execute batch jobs, we should make the lifecycle 
management of intermediate result partitions the responsibility of the 
{{JobMaster}}/{{Scheduler}} component. The {{Scheduler}} knows best when an 
intermediate result partition is no longer needed and, thus, can be freed. So 
for example, a blocking intermediate result should only be released after all 
subsequent blocking intermediate results have been completed in order to speed 
up potential failovers.

Moreover, having explicit control over intermediate result partitions, could 
also enable use cases like result partition sharing between jobs and even 
across clusters (by simply not releasing the result partitions). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12068) Backtrack fail over regions if intermediate results are unavailable

2019-03-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12068:
-

 Summary: Backtrack fail over regions if intermediate results are 
unavailable
 Key: FLINK-12068
 URL: https://issues.apache.org/jira/browse/FLINK-12068
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Till Rohrmann


The batch failover strategy needs to be able to backtrack fail over regions if 
an intermediate result is unavailable. Either by explicitly checking whether 
the intermediate result partition is available or via a special exception 
indicating that a result partition is no longer available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12067) Refactor the constructor of NetworkEnvironment

2019-03-29 Thread zhijiang (JIRA)
zhijiang created FLINK-12067:


 Summary: Refactor the constructor of NetworkEnvironment
 Key: FLINK-12067
 URL: https://issues.apache.org/jira/browse/FLINK-12067
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: zhijiang
Assignee: zhijiang


The constructor of {{NetworkEnvironment}} could be refactored to only contain 
{{NetworkEnvironmentConfiguration}}, the other related components such as 
{{TaskEventDispatcher}}, {{ResultPartitionManager}}, {{NetworkBufferPool}} 
could be created internally.

We also refactor the process of generating {{NetworkEnvironmentConfiguration}} 
in {{TaskManagerServiceConfiguration}} to add {{numNetworkBuffers}} instead of 
previous {{networkBufFraction}}, {{networkBufMin}}, {{networkBufMax}}.

Further we introduce the {{NetworkEnvironmentConfigurationBuilder}} for 
creating {{NetworkEnvironmentConfiguration}} easily especially for tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12066) Remove StateSerializerProvider field in keyed state backend

2019-03-29 Thread Yu Li (JIRA)
Yu Li created FLINK-12066:
-

 Summary: Remove StateSerializerProvider field in keyed state 
backend
 Key: FLINK-12066
 URL: https://issues.apache.org/jira/browse/FLINK-12066
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Yu Li
Assignee: Yu Li
 Fix For: 1.9.0


As mentioned in [PR review of 
FLINK-10043|https://github.com/apache/flink/pull/7674#discussion_r257630962] 
with Stefan and offline discussion with Gordon, after the refactoring work 
serializer passed to {{RocksDBKeyedStateBackend}} constructor is a final one, 
thus the {{StateSerializerProvider}} field is no longer needed.

For {{HeapKeyedStateBackend}}, the only thing stops us to pass a final 
serializer is the circle dependency between the backend and 
{{HeapRestoreOperation}}, and we aim to decouple them by introducing a new 
{{HeapInternalKeyContext}} as the bridge. More details please refer to the 
coming PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12065) E2E tests fail due to illegal-access warning on Java 9

2019-03-29 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-12065:


 Summary: E2E tests fail due to illegal-access warning on Java 9
 Key: FLINK-12065
 URL: https://issues.apache.org/jira/browse/FLINK-12065
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.9.0
Reporter: Chesnay Schepler


When accessing inaccessible fields via reflection a warning is printed on Java 
9 like below:
{code}
WARNING: An illegal reflective access operation has occurred
WARNING: All illegal access operations will be denied in a future release
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by 
org.apache.flink.core.memory.HybridMemorySegment 
(file:/home/travis/build/zentol/flink/flink-dist/target/flink-1.9-SNAPSHOT-bin/flink-1.9-SNAPSHOT/lib/flink-dist_2.11-1.9-SNAPSHOT.jar)
 to field java.nio.Buffer.address
{code}
These are printed into the .out file of the processes, and cause e2e tests to 
fail since we check for empty .out files.

>From what I've gathered we cannot disable these warnings, so we'll have to 
>adapt the check to ignore these.
We can't just fix these accesses since they also occur in libraries (like 
akka's netty)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[Proposal] Shuffle resources lifecycle management

2019-03-29 Thread Andrey Zagrebin
Hi All,

While working on pluggable shuffle architecture, looking into interactive
programming and fine-grained recovery efforts, we released that lifecycle
management of intermediate result partitions needs more detailed
discussion to enable envisioned use cases.

Here I prepared a design document to address this concern. The document
proposes certain extensions to FLIP-31 (Design of Pluggable Shuffle
Service):

https://docs.google.com/document/d/13vAJJxfRXAwI4MtO8dux8hHnNMw2Biu5XRrb_hvGehA

Looking forward to your feedback.

Thanks,
Andrey


[jira] [Created] (FLINK-12064) RocksDBKeyedStateBackend uses incorrect key serializer if reconfigure happens during restore

2019-03-29 Thread Yu Li (JIRA)
Yu Li created FLINK-12064:
-

 Summary: RocksDBKeyedStateBackend uses incorrect key serializer if 
reconfigure happens during restore
 Key: FLINK-12064
 URL: https://issues.apache.org/jira/browse/FLINK-12064
 Project: Flink
  Issue Type: Bug
Reporter: Yu Li
Assignee: Yu Li
 Fix For: 1.8.0


As titled, in current {{RocksDBKeyedStateBackend}} we use {{keySerializer}} 
rather than {{keySerializerProvider.currentSchemaSerializer()}}, which is 
incorrect. The issue is not revealed in existing UT since current cases didn't 
check snapshot after state schema migration.

This is a regression issue caused by the FLINK-10043 refactoring work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Bump up the shaded dependency version

2019-03-29 Thread Chesnay Schepler

Why do you want to bump the shaded-jackson dependency?

On 28/03/2019 17:56, mayo zhang wrote:

Hi, all

Is there some way to bump up the version of shaded jars like 
flink-shaded-jackson?  Or is there someone can deal with this




Best,
Zhang





[jira] [Created] (FLINK-12063) Remove DateTimeUtils and fix datetime function problems in Blink planner

2019-03-29 Thread Jark Wu (JIRA)
Jark Wu created FLINK-12063:
---

 Summary: Remove DateTimeUtils and fix datetime function problems 
in Blink planner
 Key: FLINK-12063
 URL: https://issues.apache.org/jira/browse/FLINK-12063
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: Jark Wu


The root cause might be similar to FLINK-11935 which is the casting problem. 
For example: 


{code:sql}
CAST('1500-04-30 12:00:00' AS TIMESTAMP)
{code}
but result is 

{code}
Optimized : 1500-04-30 12:00:00
Expected :1500-04-30 12:00:00.000
Actual   :1500-04-20 12:00:00.000
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12062) Introduce bundle operator to streaming table runtime

2019-03-29 Thread Kurt Young (JIRA)
Kurt Young created FLINK-12062:
--

 Summary: Introduce bundle operator to streaming table runtime
 Key: FLINK-12062
 URL: https://issues.apache.org/jira/browse/FLINK-12062
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Reporter: Kurt Young


Bundle operator will try to save incoming records in a key-value map. Once 
bundler triggers, the bundle function will be invoked. All buffered data will 
be passed in, and one can do some optimizations based on this. 

One useful scenario for bundle operator is "Group Aggregate". We can organize 
the bundle data with grouping key. Once bundle triggers, we can first pre 
aggregate all data belongs to same key in memory, then we only have to operate 
with state once for each key. This will save lots of cost and have better 
performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)