[jira] [Created] (FLINK-6768) Quadratic/Inefficient field lookup in PojoSerializer#ensureCompatibility

2017-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6768:


 Summary: Quadratic/Inefficient field lookup in 
PojoSerializer#ensureCompatibility
 Key: FLINK-6768
 URL: https://issues.apache.org/jira/browse/FLINK-6768
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing, Type Serialization System
Affects Versions: 1.3.0, 1.4.0
Reporter: Till Rohrmann
Priority: Minor


In the {{PojoSerializer#ensureCompatibility}} method we call for each field in 
the {{PojoSerializerConfigSnapshot}} the {{findField}} method which executes a 
linear scan of the array of fields of the {{PojoSerializer}}. This effectively 
has a quadratic runtime and could be done more efficiently. For example, we 
could construct a {{HashMap}} containing the fields as keys and the index as 
values. That way we would have a have a better lookup performance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6767) Cannot load user class on local environment

2017-05-29 Thread Matt (JIRA)
Matt created FLINK-6767:
---

 Summary: Cannot load user class on local environment
 Key: FLINK-6767
 URL: https://issues.apache.org/jira/browse/FLINK-6767
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 1.2.1
 Environment: Flink 1.2.1 running on local environment on a Ignite 2.0 
node.
Reporter: Matt
Priority: Critical


This bug was discussed in [1], and a fix was proposed in [2]. In summary the 
fix requires line 298 in BlobLibraryCacheManager.java [3] for:

{code:java}
this.classLoader = new FlinkUserCodeClassLoader(libraryURLs, 
Thread.currentThread().getContextClassLoader());
{code}

A repository with a complete test case reproducing the error is found in [4]. 
The readme file contains details on how to run it and the resulting exception:

{code}
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user 
class: com.test.Source
ClassLoader info: URL ClassLoader:
Class not resolvable through given classloader.
{code}

[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BUG-Cannot-Load-User-Class-on-Local-Environment-td12799.html

[2] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BUG-Cannot-Load-User-Class-on-Local-Environment-tp12799p13376.html

[3] 
https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L298

[4] https://github.com/Dromit/FlinkTest



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6766) Update documentation with async backends and incremental checkpoints

2017-05-29 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-6766:
-

 Summary: Update documentation with async backends and incremental 
checkpoints
 Key: FLINK-6766
 URL: https://issues.apache.org/jira/browse/FLINK-6766
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Stefan Richter
Assignee: Stefan Richter
 Fix For: 1.3.0


This PR introduces some documentation about async heap backends and incremental 
snapshots with RocksDB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6765) PojoSerializer returns wrong ConvertDeserializer when migration is required

2017-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6765:


 Summary: PojoSerializer returns wrong ConvertDeserializer when 
migration is required
 Key: FLINK-6765
 URL: https://issues.apache.org/jira/browse/FLINK-6765
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Affects Versions: 1.3.0, 1.4.0
Reporter: Till Rohrmann


The {{PojoSerializer}} returns the wrong the ConvertDeserializer when migration 
is required. The reason is the following line in PojoSerializer.java:698:

{code}
if (compatResult.getConvertDeserializer() != null) {
rebuiltCache.put(previousCachedEntry.getKey(), cachedSerializer);
} else {
return CompatibilityResult.requiresMigration();
}
{code}

which should be imo

{code}
if (compatResult.getConvertDeserializer() != null) {
rebuiltCache.put(previousCachedEntry.getKey(), 
compatResult.getConvertDeserializer());
} else {
return CompatibilityResult.requiresMigration();
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-29 Thread jincheng sun
Hi Robert,

+1 to release:

Check items have been checked, as follows: (flink-table)

1.Check that the JAVA and SCALA logical plans are consistent.
2.Check that the SQL and Table API logical plans are consistent.
3.Check that UDF, UDTF, and UDAF are working properly in group-windows and
over-windows.
4.Check that all built-in Agg on Batch and Stream are working properly.
5.Let types such as Timestamp, BigDecimal or Pojo flow through UDF. UDTF,
UDAF (input and output types).

Cheers,
SunJincheng


2017-05-29 22:57 GMT+08:00 Gyula Fóra :

> Hi,
>
> I have found an issue with rescaling incremental checkpoints:
> https://issues.apache.org/jira/browse/FLINK-6762
>
> I am not sure if this is regarded as a blocker, it depends on our
> assumptions about externalized checkpoints.
>
> What do you think?
>
> Cheers,
> Gyula
>
> Robert Metzger  ezt írta (időpont: 2017. máj. 29., H,
> 15:59):
>
> > +1 to release:
> >
> > - Tested building job against staging repository
> > - tested YARN session start and container recovery on YARN
> > - Validated HA on YARN (per job and session mode)
> > - Incremental checkpointing with rocksdb works
> > - FsStatebackend with async snapshots works
> >
> > - Flink builds from source on Linux (includes rat license header check)
> > - source doesn't contain binaries
> > - checked md5 / sha512 sum of source archive (assuming the other sums are
> > valid as well)
> >
> >
> > I noticed one minor thing: the copyright in the NOTICE file is 2014-2016.
> > I'll update it on master and the release branch.
> >
> >
> > On Mon, May 29, 2017 at 12:42 PM, Chesnay Schepler 
> > wrote:
> >
> > > +1
> > >
> > >  * Builds from source
> > >  * start/stop scripts work
> > >  * logs don't show anything suspicious on startup/shutdown
> > >  * ran some example jobs
> > >  * ran jobs on yarn with exactly-once & RocksDB
> > >  o canceling with savepoint/resuming from savepoint
> > >  o without/with rescaling
> > >  * SideOutputs work
> > >  * Metrics are properly transmitted & displayed in the webUI
> > >
> > >
> > > On 28.05.2017 20:33, Robert Metzger wrote:
> > >
> > >> @Shaoxuan, I don't think missing documentation is a release blocker,
> but
> > >> it's something we should fix asap and with high priority :)
> > >> Since the documentation is not bundled with the release, and the docs
> > are
> > >> build off the "release-x.y" branch, we can always update them (even
> > after
> > >> the release).
> > >>
> > >> However, it makes sense to have the docs ready when we announce the
> > >> release
> > >> so that users can understand how to use the newly released features.
> > >>
> > >> On Sat, May 27, 2017 at 6:14 PM, Chesnay Schepler  >
> > >> wrote:
> > >>
> > >> I've responded in the JIRA. In my opinion isn't a functional issue,
> but
> > >>> more about improving
> > >>> error messages/documentation.
> > >>>
> > >>>
> > >>> On 27.05.2017 18:07, Gyula Fóra wrote:
> > >>>
> > >>> Hi!
> >  I have found this issue: https://issues.apache.org/jira
> >  /browse/FLINK-6742
> > 
> >  Not sure if it's a blocker or not (not even completely sure what
> > causes
> >  it
> >  at the moment)
> > 
> >  Gyula
> > 
> >  Shaoxuan Wang  ezt írta (időpont: 2017. máj.
> > 27.,
> >  Szo,
> >  6:30):
> > 
> >  Hi Robert.
> > 
> > > Will doc update be a blocker for release?
> > > Release 1.3 has many updates on tableAPI and SQL, and the docs are
> > kind
> > > of
> > > lagging. We are trying the efforts to update the doc as much as
> > > possible
> > > before the release is officially published, but I am hoping this is
> > > not a
> > > blocker for the release.
> > >
> > > Shaoxuan
> > >
> > >
> > >
> > > On Sat, May 27, 2017 at 12:58 AM, Robert Metzger <
> > rmetz...@apache.org>
> > > wrote:
> > >
> > > Hi all,
> > >
> > >> this is the second VOTEing release candidate for Flink 1.3.0
> > >>
> > >> The commit to be voted on:
> > >> 760eea8a <
> > http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8
> > >> a>
> > >> (*http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8a
> > >> *)
> > >>
> > >> Branch:
> > >> release-1.3.0-rc3
> > >>
> > >> The release artifacts to be voted on can be found at:
> > >> http://people.apache.org/~rmetzger/flink-1.3.0-rc3
> > >>
> > >>
> > >> The release artifacts are signed with the key with fingerprint
> > >> D9839159:
> > >> http://www.apache.org/dist/flink/KEYS
> > >>
> > >> The staging repository for this release can be found at:
> > >> *https://repository.apache.org/content/repositories/orgapach
> > >> eflink-1122
> > >>  > >> eflink-1122
> > >> *
> > >>
> > >> -
> >

Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-29 Thread Stefan Richter
Hi,

officially, we currently do not want to support rescaling from (incremental) 
checkpoints, but only from savepoints. For this reason, I would not consider 
this a blocking issue. 

Unofficially, I think we are not too far away from supporting this, but there 
is a question mark behind the efficiency. The procedure will involve redundant 
data shuffling and reconstruction efforts because incremental checkpoints 
cannot track key-groups.

Best,
Stefan 

> Am 29.05.2017 um 16:57 schrieb Gyula Fóra :
> 
> Hi,
> 
> I have found an issue with rescaling incremental checkpoints:
> https://issues.apache.org/jira/browse/FLINK-6762
> 
> I am not sure if this is regarded as a blocker, it depends on our
> assumptions about externalized checkpoints.
> 
> What do you think?
> 
> Cheers,
> Gyula
> 
> Robert Metzger  ezt írta (időpont: 2017. máj. 29., H,
> 15:59):
> 
>> +1 to release:
>> 
>> - Tested building job against staging repository
>> - tested YARN session start and container recovery on YARN
>> - Validated HA on YARN (per job and session mode)
>> - Incremental checkpointing with rocksdb works
>> - FsStatebackend with async snapshots works
>> 
>> - Flink builds from source on Linux (includes rat license header check)
>> - source doesn't contain binaries
>> - checked md5 / sha512 sum of source archive (assuming the other sums are
>> valid as well)
>> 
>> 
>> I noticed one minor thing: the copyright in the NOTICE file is 2014-2016.
>> I'll update it on master and the release branch.
>> 
>> 
>> On Mon, May 29, 2017 at 12:42 PM, Chesnay Schepler 
>> wrote:
>> 
>>> +1
>>> 
>>> * Builds from source
>>> * start/stop scripts work
>>> * logs don't show anything suspicious on startup/shutdown
>>> * ran some example jobs
>>> * ran jobs on yarn with exactly-once & RocksDB
>>> o canceling with savepoint/resuming from savepoint
>>> o without/with rescaling
>>> * SideOutputs work
>>> * Metrics are properly transmitted & displayed in the webUI
>>> 
>>> 
>>> On 28.05.2017 20:33, Robert Metzger wrote:
>>> 
 @Shaoxuan, I don't think missing documentation is a release blocker, but
 it's something we should fix asap and with high priority :)
 Since the documentation is not bundled with the release, and the docs
>> are
 build off the "release-x.y" branch, we can always update them (even
>> after
 the release).
 
 However, it makes sense to have the docs ready when we announce the
 release
 so that users can understand how to use the newly released features.
 
 On Sat, May 27, 2017 at 6:14 PM, Chesnay Schepler 
 wrote:
 
 I've responded in the JIRA. In my opinion isn't a functional issue, but
> more about improving
> error messages/documentation.
> 
> 
> On 27.05.2017 18:07, Gyula Fóra wrote:
> 
> Hi!
>> I have found this issue: https://issues.apache.org/jira
>> /browse/FLINK-6742
>> 
>> Not sure if it's a blocker or not (not even completely sure what
>> causes
>> it
>> at the moment)
>> 
>> Gyula
>> 
>> Shaoxuan Wang  ezt írta (időpont: 2017. máj.
>> 27.,
>> Szo,
>> 6:30):
>> 
>> Hi Robert.
>> 
>>> Will doc update be a blocker for release?
>>> Release 1.3 has many updates on tableAPI and SQL, and the docs are
>> kind
>>> of
>>> lagging. We are trying the efforts to update the doc as much as
>>> possible
>>> before the release is officially published, but I am hoping this is
>>> not a
>>> blocker for the release.
>>> 
>>> Shaoxuan
>>> 
>>> 
>>> 
>>> On Sat, May 27, 2017 at 12:58 AM, Robert Metzger <
>> rmetz...@apache.org>
>>> wrote:
>>> 
>>> Hi all,
>>> 
 this is the second VOTEing release candidate for Flink 1.3.0
 
 The commit to be voted on:
 760eea8a <
>> http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8
 a>
 (*http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8a
 *)
 
 Branch:
 release-1.3.0-rc3
 
 The release artifacts to be voted on can be found at:
 http://people.apache.org/~rmetzger/flink-1.3.0-rc3
 
 
 The release artifacts are signed with the key with fingerprint
 D9839159:
 http://www.apache.org/dist/flink/KEYS
 
 The staging repository for this release can be found at:
 *https://repository.apache.org/content/repositories/orgapach
 eflink-1122
 

[jira] [Created] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6764:


 Summary: Deduplicate stateless TypeSerializers when serializing 
composite TypeSerializers
 Key: FLINK-6764
 URL: https://issues.apache.org/jira/browse/FLINK-6764
 Project: Flink
  Issue Type: Improvement
  Components: Type Serialization System
Affects Versions: 1.3.0, 1.4.0
Reporter: Till Rohrmann


Composite type serializer, such as the {{PojoSerializer}}, could be improved by 
deduplicating stateless {{TypeSerializer}} when being serialized. This would 
decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6763) Inefficient PojoSerializerConfigSnapshot serialization format

2017-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6763:


 Summary: Inefficient PojoSerializerConfigSnapshot serialization 
format
 Key: FLINK-6763
 URL: https://issues.apache.org/jira/browse/FLINK-6763
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing, Type Serialization System
Affects Versions: 1.3.0, 1.4.0
Reporter: Till Rohrmann


The {{PojoSerializerConfigSnapshot}} stores for each serializer the beginning 
offset and ending offset in the serialization stream. This information is also 
written if the serializer serialization is supposed to be ignored. The 
beginning and ending offsets are stored as a sequence of integers at the 
beginning of the serialization stream. We store this information to skip broken 
serializers.

I think we don't need both offsets. Instead I would suggest to write the length 
of the serialized serializer first into the serialization stream and then the 
serialized serializer. This can be done in 
{{TypeSerializerSerializationUtil.writeSerializer}}. When reading the 
serializer via {{TypeSerializerSerializationUtil.tryReadSerializer}}, we can 
try to deserialize the serializer. If this operation fails, then we can skip 
the number of serialized serializer because we know how long it was.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-29 Thread Gyula Fóra
Hi,

I have found an issue with rescaling incremental checkpoints:
https://issues.apache.org/jira/browse/FLINK-6762

I am not sure if this is regarded as a blocker, it depends on our
assumptions about externalized checkpoints.

What do you think?

Cheers,
Gyula

Robert Metzger  ezt írta (időpont: 2017. máj. 29., H,
15:59):

> +1 to release:
>
> - Tested building job against staging repository
> - tested YARN session start and container recovery on YARN
> - Validated HA on YARN (per job and session mode)
> - Incremental checkpointing with rocksdb works
> - FsStatebackend with async snapshots works
>
> - Flink builds from source on Linux (includes rat license header check)
> - source doesn't contain binaries
> - checked md5 / sha512 sum of source archive (assuming the other sums are
> valid as well)
>
>
> I noticed one minor thing: the copyright in the NOTICE file is 2014-2016.
> I'll update it on master and the release branch.
>
>
> On Mon, May 29, 2017 at 12:42 PM, Chesnay Schepler 
> wrote:
>
> > +1
> >
> >  * Builds from source
> >  * start/stop scripts work
> >  * logs don't show anything suspicious on startup/shutdown
> >  * ran some example jobs
> >  * ran jobs on yarn with exactly-once & RocksDB
> >  o canceling with savepoint/resuming from savepoint
> >  o without/with rescaling
> >  * SideOutputs work
> >  * Metrics are properly transmitted & displayed in the webUI
> >
> >
> > On 28.05.2017 20:33, Robert Metzger wrote:
> >
> >> @Shaoxuan, I don't think missing documentation is a release blocker, but
> >> it's something we should fix asap and with high priority :)
> >> Since the documentation is not bundled with the release, and the docs
> are
> >> build off the "release-x.y" branch, we can always update them (even
> after
> >> the release).
> >>
> >> However, it makes sense to have the docs ready when we announce the
> >> release
> >> so that users can understand how to use the newly released features.
> >>
> >> On Sat, May 27, 2017 at 6:14 PM, Chesnay Schepler 
> >> wrote:
> >>
> >> I've responded in the JIRA. In my opinion isn't a functional issue, but
> >>> more about improving
> >>> error messages/documentation.
> >>>
> >>>
> >>> On 27.05.2017 18:07, Gyula Fóra wrote:
> >>>
> >>> Hi!
>  I have found this issue: https://issues.apache.org/jira
>  /browse/FLINK-6742
> 
>  Not sure if it's a blocker or not (not even completely sure what
> causes
>  it
>  at the moment)
> 
>  Gyula
> 
>  Shaoxuan Wang  ezt írta (időpont: 2017. máj.
> 27.,
>  Szo,
>  6:30):
> 
>  Hi Robert.
> 
> > Will doc update be a blocker for release?
> > Release 1.3 has many updates on tableAPI and SQL, and the docs are
> kind
> > of
> > lagging. We are trying the efforts to update the doc as much as
> > possible
> > before the release is officially published, but I am hoping this is
> > not a
> > blocker for the release.
> >
> > Shaoxuan
> >
> >
> >
> > On Sat, May 27, 2017 at 12:58 AM, Robert Metzger <
> rmetz...@apache.org>
> > wrote:
> >
> > Hi all,
> >
> >> this is the second VOTEing release candidate for Flink 1.3.0
> >>
> >> The commit to be voted on:
> >> 760eea8a <
> http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8
> >> a>
> >> (*http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8a
> >> *)
> >>
> >> Branch:
> >> release-1.3.0-rc3
> >>
> >> The release artifacts to be voted on can be found at:
> >> http://people.apache.org/~rmetzger/flink-1.3.0-rc3
> >>
> >>
> >> The release artifacts are signed with the key with fingerprint
> >> D9839159:
> >> http://www.apache.org/dist/flink/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> *https://repository.apache.org/content/repositories/orgapach
> >> eflink-1122
> >>  >> eflink-1122
> >> *
> >>
> >> -
> >>
> >>
> >> The vote ends on Tuesday (May 30th), 7pm CET.
> >>
> >> [ ] +1 Release this package as Apache Flink 1.3.0
> >> [ ] -1 Do not release this package, because ...
> >>
> >>
> >>
> >
>


[jira] [Created] (FLINK-6762) Cannot rescale externalized incremental checkpoints

2017-05-29 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-6762:
-

 Summary: Cannot rescale externalized incremental checkpoints
 Key: FLINK-6762
 URL: https://issues.apache.org/jira/browse/FLINK-6762
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.3.0
Reporter: Gyula Fora
Priority: Critical


When a job is rescaled from an externalized incremental checkpoint, the 
subsequent checkpoints fail with the following error:

org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize the 
pending checkpoint 3205.
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:861)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:776)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$1.apply$mcV$sp(JobManager.scala:1462)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$1.apply(JobManager.scala:1461)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$1.apply(JobManager.scala:1461)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Unknown implementation of StreamStateHandle: 
class org.apache.flink.runtime.state.PlaceholderStreamStateHandle
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointV2Serializer.serializeStreamStateHandle(SavepointV2Serializer.java:484)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointV2Serializer.serializeStreamStateHandleMap(SavepointV2Serializer.java:342)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointV2Serializer.serializeKeyedStateHandle(SavepointV2Serializer.java:329)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointV2Serializer.serializeSubtaskState(SavepointV2Serializer.java:270)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointV2Serializer.serialize(SavepointV2Serializer.java:122)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointV2Serializer.serialize(SavepointV2Serializer.java:66)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.storeSavepointToHandle(SavepointStore.java:199)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.storeExternalizedCheckpointToHandle(SavepointStore.java:164)
at 
org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpointExternalized(PendingCheckpoint.java:286)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:851)

Full log:
https://gist.github.com/gyfora/693b9a720aace843ff4570e504c4a242

Rescaling with savepoints work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-29 Thread Robert Metzger
+1 to release:

- Tested building job against staging repository
- tested YARN session start and container recovery on YARN
- Validated HA on YARN (per job and session mode)
- Incremental checkpointing with rocksdb works
- FsStatebackend with async snapshots works

- Flink builds from source on Linux (includes rat license header check)
- source doesn't contain binaries
- checked md5 / sha512 sum of source archive (assuming the other sums are
valid as well)


I noticed one minor thing: the copyright in the NOTICE file is 2014-2016.
I'll update it on master and the release branch.


On Mon, May 29, 2017 at 12:42 PM, Chesnay Schepler 
wrote:

> +1
>
>  * Builds from source
>  * start/stop scripts work
>  * logs don't show anything suspicious on startup/shutdown
>  * ran some example jobs
>  * ran jobs on yarn with exactly-once & RocksDB
>  o canceling with savepoint/resuming from savepoint
>  o without/with rescaling
>  * SideOutputs work
>  * Metrics are properly transmitted & displayed in the webUI
>
>
> On 28.05.2017 20:33, Robert Metzger wrote:
>
>> @Shaoxuan, I don't think missing documentation is a release blocker, but
>> it's something we should fix asap and with high priority :)
>> Since the documentation is not bundled with the release, and the docs are
>> build off the "release-x.y" branch, we can always update them (even after
>> the release).
>>
>> However, it makes sense to have the docs ready when we announce the
>> release
>> so that users can understand how to use the newly released features.
>>
>> On Sat, May 27, 2017 at 6:14 PM, Chesnay Schepler 
>> wrote:
>>
>> I've responded in the JIRA. In my opinion isn't a functional issue, but
>>> more about improving
>>> error messages/documentation.
>>>
>>>
>>> On 27.05.2017 18:07, Gyula Fóra wrote:
>>>
>>> Hi!
 I have found this issue: https://issues.apache.org/jira
 /browse/FLINK-6742

 Not sure if it's a blocker or not (not even completely sure what causes
 it
 at the moment)

 Gyula

 Shaoxuan Wang  ezt írta (időpont: 2017. máj. 27.,
 Szo,
 6:30):

 Hi Robert.

> Will doc update be a blocker for release?
> Release 1.3 has many updates on tableAPI and SQL, and the docs are kind
> of
> lagging. We are trying the efforts to update the doc as much as
> possible
> before the release is officially published, but I am hoping this is
> not a
> blocker for the release.
>
> Shaoxuan
>
>
>
> On Sat, May 27, 2017 at 12:58 AM, Robert Metzger 
> wrote:
>
> Hi all,
>
>> this is the second VOTEing release candidate for Flink 1.3.0
>>
>> The commit to be voted on:
>> 760eea8a > a>
>> (*http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8a
>> *)
>>
>> Branch:
>> release-1.3.0-rc3
>>
>> The release artifacts to be voted on can be found at:
>> http://people.apache.org/~rmetzger/flink-1.3.0-rc3
>>
>>
>> The release artifacts are signed with the key with fingerprint
>> D9839159:
>> http://www.apache.org/dist/flink/KEYS
>>
>> The staging repository for this release can be found at:
>> *https://repository.apache.org/content/repositories/orgapach
>> eflink-1122
>> > eflink-1122
>> *
>>
>> -
>>
>>
>> The vote ends on Tuesday (May 30th), 7pm CET.
>>
>> [ ] +1 Release this package as Apache Flink 1.3.0
>> [ ] -1 Do not release this package, because ...
>>
>>
>>
>


[jira] [Created] (FLINK-6761) Limitation for maximum state size per key in RocksDB backend

2017-05-29 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-6761:
-

 Summary: Limitation for maximum state size per key in RocksDB 
backend
 Key: FLINK-6761
 URL: https://issues.apache.org/jira/browse/FLINK-6761
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.2.1, 1.3.0
Reporter: Stefan Richter
Priority: Critical


RocksDB`s JNI bridge allows for putting and getting `byte[]` as keys and 
values. 
States that internally use RocksDB's merge operator, e.g. `ListState`, can 
currently merge multiple `byte[]` under one key, which will be internally 
concatenated to one value in RocksDB. 

This becomes problematic, as soon as the accumulated state size under one key 
grows larger than `Integer.MAX_VALUE` bytes. Whenever Java code tries to access 
a state that grew beyond this limit through merging, we will encounter an 
`ArrayIndexOutOfBoundsException` at best and a segfault at worst.

This behaviour is problematic, because RocksDB silently stores states that 
exceed this limitation, but on access (e.g. in checkpointing), the code fails 
unexpectedly.

I think the only proper solution to this is for RocksDB's JNI bridge to build 
on `(Direct)ByteBuffer` - which can go around the size limitation - as input 
and output types, instead of simple `byte[]`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6760) Fix OverWindowTest alias test error

2017-05-29 Thread sunjincheng (JIRA)
sunjincheng created FLINK-6760:
--

 Summary: Fix OverWindowTest alias test error
 Key: FLINK-6760
 URL: https://issues.apache.org/jira/browse/FLINK-6760
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.3.0
Reporter: sunjincheng
Assignee: sunjincheng


For Sql:
{code}
val sql = "SELECT c, count(a) OVER (ORDER BY proctime ROWS BETWEEN 2 preceding 
AND CURRENT ROW) as cnt1 CURRENT ROW from MyTable"
{code}

The alias `cnt1` The alias did not take effect when we generated the plan 
string.  But we can using the alias in outer layer query, for example:

{code}
val sql = "SELECT cnt1 from (SELECT c, count(a) OVER (ORDER BY proctime ROWS 
BETWEEN 2 preceding AND CURRENT ROW) as cnt1 CURRENT ROW from MyTable)"
{code}

So in this JIRA. we just fix the test case for 1.3 release. In another JIRA. 
will improve the alias. 





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-29 Thread Chesnay Schepler

+1

 * Builds from source
 * start/stop scripts work
 * logs don't show anything suspicious on startup/shutdown
 * ran some example jobs
 * ran jobs on yarn with exactly-once & RocksDB
 o canceling with savepoint/resuming from savepoint
 o without/with rescaling
 * SideOutputs work
 * Metrics are properly transmitted & displayed in the webUI

On 28.05.2017 20:33, Robert Metzger wrote:

@Shaoxuan, I don't think missing documentation is a release blocker, but
it's something we should fix asap and with high priority :)
Since the documentation is not bundled with the release, and the docs are
build off the "release-x.y" branch, we can always update them (even after
the release).

However, it makes sense to have the docs ready when we announce the release
so that users can understand how to use the newly released features.

On Sat, May 27, 2017 at 6:14 PM, Chesnay Schepler 
wrote:


I've responded in the JIRA. In my opinion isn't a functional issue, but
more about improving
error messages/documentation.


On 27.05.2017 18:07, Gyula Fóra wrote:


Hi!
I have found this issue: https://issues.apache.org/jira/browse/FLINK-6742

Not sure if it's a blocker or not (not even completely sure what causes it
at the moment)

Gyula

Shaoxuan Wang  ezt írta (időpont: 2017. máj. 27.,
Szo,
6:30):

Hi Robert.

Will doc update be a blocker for release?
Release 1.3 has many updates on tableAPI and SQL, and the docs are kind
of
lagging. We are trying the efforts to update the doc as much as possible
before the release is officially published, but I am hoping this is not a
blocker for the release.

Shaoxuan



On Sat, May 27, 2017 at 12:58 AM, Robert Metzger 
wrote:

Hi all,

this is the second VOTEing release candidate for Flink 1.3.0

The commit to be voted on:
760eea8a 
(*http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8a
*)

Branch:
release-1.3.0-rc3

The release artifacts to be voted on can be found at:
http://people.apache.org/~rmetzger/flink-1.3.0-rc3


The release artifacts are signed with the key with fingerprint D9839159:
http://www.apache.org/dist/flink/KEYS

The staging repository for this release can be found at:
*https://repository.apache.org/content/repositories/orgapacheflink-1122


Re: [DISCUSS] Backwards compatibility policy.

2017-05-29 Thread Aljoscha Krettek
Normally, I’m the first one to suggest removing everything that is not 
absolutely necessary in order to have a clean code base. On this issue, though, 
I think we should support restoring from old Savepoints as far back as possible 
if it does not make the code completely unmaintainable. Some users might jump 
versions and always forcing them to go though every version from their old 
version to the current version doesn’t seem feasible and might put off some 
users.

So far, I think the burden of supporting restore from 1.1 is still small enough 
and with each new version the changes between versions become less and less. 
The changes from 1.2 to the upcoming 1.3 are quite minimal, I think.

Best,
Aljoscha
> On 24. May 2017, at 17:58, Ted Yu  wrote:
> 
> bq. about having LTS versions once a year
> 
> +1 to the above.
> 
> There may be various reasons users don't want to upgrade (after new
> releases come out). We should give such users enough flexibility on the
> upgrade path.
> 
> Cheers
> 
> On Wed, May 24, 2017 at 8:39 AM, Kostas Kloudas > wrote:
> 
>> Hi all,
>> 
>> For the proposal of having a third party tool, I agree with Ted.
>> Maintaining
>> it is a big and far from trivial effort.
>> 
>> Now for the window of backwards compatibility, I would argue that even if
>> for some users 4 months (1 release) is not enough to bump their Flink
>> version,
>> the proposed policy guarantees that there will always be a path from any
>> old
>> version to any subsequent one.
>> 
>> Finally, for the proposal about having LTS versions once a year, I am not
>> sure if this will reduce or create more overhead. If I understand the plan
>> correctly, this would mean that the community will have to maintain
>> 2 or 3 LTS versions and the last two major ones, right?
>> 
>>> On May 22, 2017, at 7:31 PM, Ted Yu  wrote:
>>> 
>>> For #2, it is difficult to achieve:
>>> 
>>> a. maintaining savepoint migration is non-trivial and should be reviewed
>> by
>>> domain experts
>>> b. how to certify such third-party tool
>>> 
>>> Cheers
>>> 
>>> On Mon, May 22, 2017 at 3:04 AM, 施晓罡  wrote:
>>> 
 Hi all,
 
 Currently, we work a lot in the maintenance of compatibility.
 There exist much code in runtime to support the migration of savepoints
 (most of which are deprecated), making it hard to focus on the current
 implementation.
 When more versions are released, much more efforts will be needed if we
 try to make these released versions compatible.
 
 I agree with Tzu-Li that we should provide a method to let users upgrade
 Flink in a reasonable pace.
 But i am against the proposal that we only offer backwards compatibility
 for one previous version.
 According our time-based release model, a major version is released
>> every
 four month.
 That means, users have to upgrade their versions every 8 months.
>> Otherwise
 they will have difficulties in the migration of existing savepoints.
 
 My suggestions include
 
 (1) We can release Long-Term Support (LTS) versions which are widely
 adopted in other open-source projects.
 LTS versions should be stable and are free of found bugs. Savepoints in
 LTS versions are guaranteed to be back-compatible so that users can
>> easily
 upgrade to newer LTS versions.
 
 The releasing of LTS versions is slower than that of major versions
>> (maybe
 once a year, determined by users’ upgrade frequency).
 Each LTS version will be supported a period of time and typically there
 are no more than three active LTS versions.
 By encouraging users to use LTS versions, we can ease the maintenance of
 released versions (bug fixes, back compatibility, and critical
>> performance
 improvement).
 
 (2) We can provide a third-party tool to do the migration of
>> old-versioned
 savepoints.
 When users upgrade their versions, they can use the provided tool to
 migrate existing savepoints.
 This can help move the code for savepoint migration out of the actual
 codebase,  making code focuses on current implementation.
 
 What do you think?
 
 Regards,
 Xiaogang
 
 
> 在 2017年5月22日,下午1:39,Tzu-Li (Gordon) Tai  写道:
> 
> Hi Kostas,
> 
> Thanks for bringing this up!
> I think it is reasonable to keep this coherent with our timely-based
 release model guarantees.
> 
> With the timely-based release model, there is a guarantee that the
 current latest major version and the previous one is supported.
> For example, upon releasing 1.3, only 1.3 and 1.2 will still be
 supported by the community for any required bug fixes.
> I think this was initially decided not only to ease old version
 maintenance efforts for the community, but also as a means to let users
 upgrade their Flink versions in a reasonable pace (at least every other
 major release.)
> 
> Therefore, I think its 

[jira] [Created] (FLINK-6759) storm-examples cannot be built without cached dependencies

2017-05-29 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-6759:
---

 Summary: storm-examples cannot be built without cached dependencies
 Key: FLINK-6759
 URL: https://issues.apache.org/jira/browse/FLINK-6759
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.4.0
Reporter: Chesnay Schepler


The {{flink-storm-examples}} module fails to build if the 
{{flink-examples-batch}} dependency is not present in the local cache.

{code}
[ERROR] Failed to execute goal on project flink-storm-examples_2.10: Could not 
resolve dependencies for project 
org.apache.flink:flink-storm-examples_2.10:jar:1.4-SNAPSHOT:
Failed to collect dependenc
ies at org.apache.flink:flink-examples-batch_2.10:jar:1.4-SNAPSHOT: Failed to 
read artifact descriptor for 
org.apache.flink:flink-examples-batch_2.10:jar:1.4-SNAPSHOT:
Failure to find org.apache.flink
:flink-examples_${scala.binary.version}:pom:1.4-SNAPSHOT in 
https://repository.apache.org/snapshots was cached in the local repository,
resolution will not be reattempted until the update interval of
apache.snapshots has elapsed or updates are forced -> [Help 1]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6758) Loaded configuration values are logged twice

2017-05-29 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-6758:
---

 Summary: Loaded configuration values are logged twice
 Key: FLINK-6758
 URL: https://issues.apache.org/jira/browse/FLINK-6758
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 1.3.0, 1.4.0
Reporter: Chesnay Schepler
Priority: Trivial


When starting a Job- or TaskManager i found the following duplicated lines in 
the logs

{code}
2017-05-29 09:28:07,834 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- Loading configuration from /home/Zento/rc3/dist/flink-1.3.0/conf
2017-05-29 09:28:07,837 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.rpc.address, localhost
2017-05-29 09:28:07,837 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.rpc.port, 6123
2017-05-29 09:28:07,837 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.heap.mb, 1024
2017-05-29 09:28:07,837 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.heap.mb, 1024
2017-05-29 09:28:07,837 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.numberOfTaskSlots, 1
2017-05-29 09:28:07,837 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.memory.preallocate, false
2017-05-29 09:28:07,838 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: parallelism.default, 1
2017-05-29 09:28:07,838 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.web.port, 8081
2017-05-29 09:28:07,847 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.rpc.address, localhost
2017-05-29 09:28:07,847 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.rpc.port, 6123
2017-05-29 09:28:07,847 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.heap.mb, 1024
2017-05-29 09:28:07,848 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.heap.mb, 1024
2017-05-29 09:28:07,848 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.numberOfTaskSlots, 1
2017-05-29 09:28:07,848 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.memory.preallocate, false
2017-05-29 09:28:07,848 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: parallelism.default, 1
2017-05-29 09:28:07,848 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.web.port, 8081
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6757) Investigate Apache Atlas integration

2017-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6757:


 Summary: Investigate Apache Atlas integration
 Key: FLINK-6757
 URL: https://issues.apache.org/jira/browse/FLINK-6757
 Project: Flink
  Issue Type: Wish
  Components: Streaming Connectors
Reporter: Till Rohrmann


Users asked for an integration of Apache Flink with Apache Atlas. It might be 
worthwhile to investigate what is necessary to achieve this task.

References:
http://atlas.incubator.apache.org/StormAtlasHook.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6756) Provide RichAsyncFunction to Scala API suite

2017-05-29 Thread Andrea Spina (JIRA)
Andrea Spina created FLINK-6756:
---

 Summary: Provide RichAsyncFunction to Scala API suite
 Key: FLINK-6756
 URL: https://issues.apache.org/jira/browse/FLINK-6756
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Reporter: Andrea Spina


I can't find any tracking info about the chance to have a RichAsyncFunction in 
the Scala API suite. I think it'd be nice to have this function in order to 
access open/close methods and the RuntimeContext.

I was able to retrieve 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/There-is-no-Open-and-Close-method-in-Async-I-O-API-of-Scala-td11591.html#a11593
 only, so my question is if there are some blocking issues avoiding this 
feature. [~till.rohrmann]

If it's possible and nobody already have done it, I can assign the issue to 
myself in order to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6755) Allow triggering Checkpoints through command line client

2017-05-29 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-6755:
-

 Summary: Allow triggering Checkpoints through command line client
 Key: FLINK-6755
 URL: https://issues.apache.org/jira/browse/FLINK-6755
 Project: Flink
  Issue Type: New Feature
  Components: Client, State Backends, Checkpointing
Affects Versions: 1.3.0
Reporter: Gyula Fora


The command line client currently only allows triggering (and canceling with) 
Savepoints. 

While this is good if we want to fork or modify the pipelines in a 
non-checkpoint compatible way, now with incremental checkpoints this becomes 
wasteful for simple job restarts/pipeline updates. 

I suggest we add a new command: 
./bin/flink checkpoint  [checkpointDirectory]

and a new flag -c for the cancel command to indicate we want to trigger a 
checkpoint:
./bin/flink cancel -c [targetDirectory] 

Otherwise this can work similar to the current savepoint taking logic, we could 
probably even piggyback on the current messages by adding boolean flag 
indicating whether it should be a savepoint or a checkpoint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6754) Savepoints don't respect client timeout config

2017-05-29 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-6754:
-

 Summary: Savepoints don't respect client timeout config
 Key: FLINK-6754
 URL: https://issues.apache.org/jira/browse/FLINK-6754
 Project: Flink
  Issue Type: Bug
  Components: Client, Configuration
Reporter: Gyula Fora
Priority: Trivial


Savepoints have a hardcoded timeout:

Future response = jobManager.ask(new TriggerSavepoint(jobId, 
Option.apply(savepointDirectory)), new FiniteDuration(1, TimeUnit.HOURS));
.
.
.
result = Await.result(response, FiniteDuration.Inf());






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)