[jira] [Commented] (FLINK-5436) UDF state without CheckpointedRestoring can result in restarting loop

2017-01-22 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833467#comment-15833467
 ] 

Ufuk Celebi commented on FLINK-5436:


[~stefanrichte...@gmail.com] and [~aljoscha] should we address this and 
FLINK-5437 for the upcoming release or not?

> UDF state without CheckpointedRestoring can result in restarting loop
> -
>
> Key: FLINK-5436
> URL: https://issues.apache.org/jira/browse/FLINK-5436
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Priority: Minor
>
> When restoring a job with Checkpointed state and not implementing the new 
> CheckpointedRestoring interface, the job will be restarted over and over 
> again (given the respective restarting strategy).
> Since this is not recoverable, we should immediately fail the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5530) race condition in AbstractRocksDBState#getSerializedValue

2017-01-22 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5530.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in d8222c1 (release-1.2), d16552d (master).

> race condition in AbstractRocksDBState#getSerializedValue
> -
>
> Key: FLINK-5530
> URL: https://issues.apache.org/jira/browse/FLINK-5530
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State
>Affects Versions: 1.2.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
> Fix For: 1.2.0, 1.3.0
>
>
> AbstractRocksDBState#getSerializedValue() uses the same key serialisation 
> stream as the ordinary state access methods but is called in parallel during 
> state queries thus violating the assumption of only one thread accessing it. 
> This may lead to either wrong results in queries or corrupt data while 
> queries are executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5602) Migration with RocksDB job led to NPE for next checkpoint

2017-01-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5602:
--

 Summary: Migration with RocksDB job led to NPE for next checkpoint
 Key: FLINK-5602
 URL: https://issues.apache.org/jira/browse/FLINK-5602
 Project: Flink
  Issue Type: Bug
Reporter: Ufuk Celebi


When migrating a job with RocksDB I got the following Exception when the next 
checkpoint was triggered. This only happened once and I could not reproduce it 
ever since.

[~stefanrichte...@gmail.com] Maybe we can look over the code and check what 
could have failed here? I unfortunately don't have more available of the stack 
trace. I don't think that this will be very helpful will it?

{code}
at 
org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:58)
at 
org.apache.flink.runtime.state.KeyedBackendSerializationProxy$StateMetaInfo.(KeyedBackendSerializationProxy.java:126)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:471)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:382)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.performOperation(RocksDBKeyedStateBackend.java:280)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.performOperation(RocksDBKeyedStateBackend.java:262)
at 
org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:37)
... 6 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5601) Window operator does not checkpoint watermarks

2017-01-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5601:
--

 Summary: Window operator does not checkpoint watermarks
 Key: FLINK-5601
 URL: https://issues.apache.org/jira/browse/FLINK-5601
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi


During release testing [~stefanrichte...@gmail.com] and I noticed that 
watermarks are not checkpointed in the window operator.

This can lead to non determinism when restoring checkpoints. I was running an 
adjusted {{SessionWindowITCase}} via Kafka for testing migration and rescaling 
and ran into failures, because the data generator required determinisitic 
behaviour.

What happened was that on restore it could happen that late elements were not 
dropped, because the watermarks needed to be re-established after restore first.

[~aljoscha] Do you know whether there is a special reason for explicitly not 
checkpointing watermarks?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5600) Improve error message when triggering savepoint without specified directory

2017-01-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5600:
--

 Summary: Improve error message when triggering savepoint without 
specified directory
 Key: FLINK-5600
 URL: https://issues.apache.org/jira/browse/FLINK-5600
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Minor


When triggering a savepoint w/o specifying a custom target directory or having 
configured a default directory, we get a quite long stack trace:

{code}
java.lang.Exception: Failed to trigger savepoint
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:801)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:118)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: No savepoint directory configured. 
You can either specify a directory when triggering this savepoint or configure 
a cluster-wide default via key 'state.savepoints.dir'.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:764)
... 22 more
{code}

This is already quite good, because the Exception says what can be done to work 
around this problem, but we can make it even better by handling this error in 
the client and printing a more explicit message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5599) State interface docs often refer to keyed state only

2017-01-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5599:
--

 Summary: State interface docs often refer to keyed state only
 Key: FLINK-5599
 URL: https://issues.apache.org/jira/browse/FLINK-5599
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Minor


The JavaDocs of the {{State}} interface (and related classes) often mention 
keyed state only as the state interface was only exposed for keyed state until 
Flink 1.1. With the new {{CheckpointedFunction}} interface, this has changed 
and the docs should be adjusted accordingly.

Would be nice to address this with 1.2.0 so that the JavaDocs are updated for 
users. [~stefanrichte...@gmail.com] or [~aljoscha] maybe you can have a look at 
this briefly?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5528) tests: reduce the retry delay in QueryableStateITCase

2017-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5528.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 03a1f25 (release-1.2), 8d64263 (master).

> tests: reduce the retry delay in QueryableStateITCase
> -
>
> Key: FLINK-5528
> URL: https://issues.apache.org/jira/browse/FLINK-5528
> Project: Flink
>  Issue Type: Improvement
>  Components: Queryable State
>Affects Versions: 1.2.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
> Fix For: 1.2.0, 1.3.0
>
>
> The QueryableStateITCase uses a retry of 1 second, e.g. if a queried key does 
> not exist yet. This seems a bit too conservative as the job may not take that 
> long to deploy and especially since getKvStateWithRetries() recovers from 
> failures by retrying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5468) Restoring from a semi async rocksdb statebackend (1.1) to 1.2 fails with ClassNotFoundException

2017-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5468.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in d431373 (release-1.2), 988729e (master).

> Restoring from a semi async rocksdb statebackend (1.1) to 1.2 fails with 
> ClassNotFoundException
> ---
>
> Key: FLINK-5468
> URL: https://issues.apache.org/jira/browse/FLINK-5468
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Robert Metzger
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
>
> I think we should catch this exception and explain what's going on and how 
> users can resolve the issue.
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
>   at 
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
>   at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>   at com.dataartisans.eventwindow.Generator.main(Generator.java:60)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
>   at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
>   at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
>   at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed
>   at 
> org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:328)
>   at 
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:382)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
>   ... 22 more
> Caused by: java.io.IOException: java.lang.ClassNotFoundException: 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend$FinalSemiAsyncSnapshot
>   at 
> org.apache.flink.migration.runtime.checkpoint.savepoint.SavepointV0Serializer.deserialize(SavepointV0Serializer.java:162)
>   at 
> org.apache.flink.migration.runtime.checkpoint.savepoint.SavepointV0Serializer.deserialize(SavepointV0Serializer.java:70)
>   at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:138)
>   at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> 

[jira] [Closed] (FLINK-5561) DataInputDeserializer#available returns one too few

2017-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5561.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 108865d (release-1.2), 6c4644d (master).

> DataInputDeserializer#available returns one too few
> ---
>
> Key: FLINK-5561
> URL: https://issues.apache.org/jira/browse/FLINK-5561
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
> Fix For: 1.2.0, 1.3.0
>
>
> DataInputDeserializer#available seems to assume that the position points to 
> the last read byte but instead it points to the next byte. Therefore, it 
> returns a value which is 1 smaller than the correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5515) fix unused kvState.getSerializedValue call in KvStateServerHandler

2017-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5515.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 6e85106 (release-1.2), ddd7c36 (master).

> fix unused kvState.getSerializedValue call in KvStateServerHandler
> --
>
> Key: FLINK-5515
> URL: https://issues.apache.org/jira/browse/FLINK-5515
> Project: Flink
>  Issue Type: Improvement
>Reporter: Nico Kruber
> Fix For: 1.2.0, 1.3.0
>
>
> This was added in 4809f5367b08a9734fc1bd4875be51a9f3bb65aa and is probably a 
> left-over from a merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5587) AsyncWaitOperatorTest timed out on Travis

2017-01-20 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5587:
--

 Summary: AsyncWaitOperatorTest timed out on Travis
 Key: FLINK-5587
 URL: https://issues.apache.org/jira/browse/FLINK-5587
 Project: Flink
  Issue Type: Test
Reporter: Ufuk Celebi


The Maven watch dog script cancelled the build and printed a stack trace for 
{{AsyncWaitOperatorTest.testOperatorChainWithProcessingTime(AsyncWaitOperatorTest.java:379)}}.

https://s3.amazonaws.com/archive.travis-ci.org/jobs/192441719/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5507) remove queryable list state sink

2017-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5507.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 1c8a48f (release-1.2), 63a6af3 (master).

> remove queryable list state sink
> 
>
> Key: FLINK-5507
> URL: https://issues.apache.org/jira/browse/FLINK-5507
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
> Fix For: 1.2.0, 1.3.0
>
>
> The queryable state "sink" using ListState 
> (".asQueryableState(, ListStateDescriptor)") stores all incoming data 
> forever and is never cleaned. Eventually, it will pile up too much memory and 
> is thus of limited use.
> We should remove it from the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5482) QueryableStateClient does not recover from a failed lookup due to a non-running job

2017-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5482.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 1db8102 (release-1.2), 7ff7f43 (master).

> QueryableStateClient does not recover from a failed lookup due to a 
> non-running job
> ---
>
> Key: FLINK-5482
> URL: https://issues.apache.org/jira/browse/FLINK-5482
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
> Fix For: 1.2.0, 1.3.0
>
>
> When the QueryableStateClient is used to issue a query but the job is not 
> running yet, its internal lookup result is cached with an 
> IllegalStateException that the job was not found. It does, however, never 
> recover from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5556) BarrierBuffer resets bytes written on spiller roll over

2017-01-19 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5556.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in a4853ba (release-1.2), e1b2cd0 (master).


> BarrierBuffer resets bytes written on spiller roll over
> ---
>
> Key: FLINK-5556
> URL: https://issues.apache.org/jira/browse/FLINK-5556
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
> Fix For: 1.2.0, 1.3.0
>
>
> When rolling over a spilled sequence of buffers, the tracker bytes written 
> are reset to 0. They are reported to the checkpoint listener right after this 
> operation, which results in the reported buffered bytes always being 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5574) Add checkpoint statistics docs

2017-01-19 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5574.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 75f3681 (release-1.2), 6fb9ebc (master).

> Add checkpoint statistics docs
> --
>
> Key: FLINK-5574
> URL: https://issues.apache.org/jira/browse/FLINK-5574
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
> Fix For: 1.2.0, 1.3.0
>
>
> Add docs about the current state of checkpoint monitoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5574) Add checkpoint statistics docs

2017-01-19 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5574:
--

 Summary: Add checkpoint statistics docs
 Key: FLINK-5574
 URL: https://issues.apache.org/jira/browse/FLINK-5574
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


Add docs about the current state of checkpoint monitoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5560) Header in checkpoint stats summary misaligned

2017-01-19 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5560.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 772884e (release-1.2), 1be76f6 (master).

> Header in checkpoint stats summary misaligned
> -
>
> Key: FLINK-5560
> URL: https://issues.apache.org/jira/browse/FLINK-5560
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
> Fix For: 1.2.0, 1.3.0
>
>
> The checkpoint summary stats table header line is misaligned. The first and 
> second head columns need to be swapped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5377) Improve savepoint docs

2017-01-19 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5377.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 57a3955 (release-1.2), 6d8b3f5 (master).

> Improve savepoint docs
> --
>
> Key: FLINK-5377
> URL: https://issues.apache.org/jira/browse/FLINK-5377
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0
>
>
> The savepoint docs are very detailed and focus on the internals. They should 
> better convey what users have to take care of.
> The following questions should be answered:
> What happens if I add a new operator that requires state to my flow?
> What happens if I delete an operator that has state to my flow?
> What happens if I reorder stateful operators in my flow?
> What happens if I add or delete or reorder operators that have no state in my 
> flow?
> Should I apply .uid to all operators in my flow?
> Should I apply .uid to only the operators that have state?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5415) ContinuousFileProcessingTest failed on travis

2017-01-19 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829692#comment-15829692
 ] 

Ufuk Celebi commented on FLINK-5415:


https://s3.amazonaws.com/archive.travis-ci.org/jobs/193115046/log.txt

> ContinuousFileProcessingTest failed on travis
> -
>
> Key: FLINK-5415
> URL: https://issues.apache.org/jira/browse/FLINK-5415
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Chesnay Schepler
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/189171123/log.txt
> testFunctionRestore(org.apache.flink.hdfstests.ContinuousFileProcessingTest)  
> Time elapsed: 0.162 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1483623669528> but 
> was:<-9223372036854775808>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFunctionRestore(ContinuousFileProcessingTest.java:761)
> testProcessOnce(org.apache.flink.hdfstests.ContinuousFileProcessingTest)  
> Time elapsed: 0.045 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessOnce(ContinuousFileProcessingTest.java:675)
> testFileReadingOperatorWithIngestionTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest)
>   Time elapsed: 0.002 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime(ContinuousFileProcessingTest.java:150)
> testSortingOnModTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) 
>  Time elapsed: 0.002 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.testSortingOnModTime(ContinuousFileProcessingTest.java:596)
> testFileReadingOperatorWithEventTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest)
>   Time elapsed: 0.001 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime(ContinuousFileProcessingTest.java:308)
> testProcessContinuously(org.apache.flink.hdfstests.ContinuousFileProcessingTest)
>   Time elapsed: 0.001 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958)
>   at 
> org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessContinuously(ContinuousFileProcessingTest.java:771)
> Results :
> Failed tests: 
>   
> ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime:308->createFileAndFillWithData:958
>  null
>   
> ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime:150->createFileAndFillWithData:958
>  null
>   ContinuousFileProcessingTest.testFunctionRestore:761 
> expected:<1483623669528> but was:<-9223372036854775808>
>   
> 

[jira] [Created] (FLINK-5560) Header in checkpoint stats summary misaligned

2017-01-18 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5560:
--

 Summary: Header in checkpoint stats summary misaligned
 Key: FLINK-5560
 URL: https://issues.apache.org/jira/browse/FLINK-5560
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


The checkpoint summary stats table header line is misaligned. The first and 
second head columns need to be swapped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5556) BarrierBuffer resets bytes written on spiller roll over

2017-01-18 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5556:
--

 Summary: BarrierBuffer resets bytes written on spiller roll over
 Key: FLINK-5556
 URL: https://issues.apache.org/jira/browse/FLINK-5556
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


When rolling over a spilled sequence of buffers, the tracker bytes written are 
reset to 0. They are reported to the checkpoint listener right after this 
operation, which results in the reported buffered bytes always being 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5484) Kryo serialization changed between 1.1 and 1.2

2017-01-18 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5484.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in
931929b (release-1.1),
55483b7, a7644b1 (release-1.2),
8fddae8, 586f818 (master).

> Kryo serialization changed between 1.1 and 1.2
> --
>
> Key: FLINK-5484
> URL: https://issues.apache.org/jira/browse/FLINK-5484
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Reporter: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0
>
>
> I think the way that Kryo serializes data changed between 1.1 and 1.2.
> I have a generic Object that is serialized as part of a 1.1 savepoint that I 
> cannot resume from with 1.2:
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
>   at 
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
>   at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1486)
>   at com.dataartisans.DidKryoChange.main(DidKryoChange.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
>   at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
>   at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
>   at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:900)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: Could not initialize keyed state 
> backend.
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:286)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:199)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:649)
>   at 
> 

[jira] [Commented] (FLINK-2608) Arrays.asList(..) does not work with CollectionInputFormat

2017-01-18 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828174#comment-15828174
 ] 

Ufuk Celebi commented on FLINK-2608:


Reverted in a7644b1 (release-1.2), 586f818 (master).

> Arrays.asList(..) does not work with CollectionInputFormat
> --
>
> Key: FLINK-2608
> URL: https://issues.apache.org/jira/browse/FLINK-2608
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: 0.9, 0.10.0
>Reporter: Maximilian Michels
>Assignee: Alexander Chermenin
>Priority: Minor
> Fix For: 1.2.0
>
>
> When using Arrays.asList(..) as input for a CollectionInputFormat, the 
> serialization/deserialization fails when deploying the task.
> See the following program:
> {code:java}
> public class WordCountExample {
> public static void main(String[] args) throws Exception {
> final ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> DataSet text = env.fromElements(
> "Who's there?",
> "I think I hear them. Stand, ho! Who's there?");
> // DOES NOT WORK
> List elements = Arrays.asList(0, 0, 0);
> // The following works:
> //List elements = new ArrayList<>(new int[] {0,0,0});
> DataSet set = env.fromElements(new TestClass(elements));
> DataSet> wordCounts = text
> .flatMap(new LineSplitter())
> .withBroadcastSet(set, "set")
> .groupBy(0)
> .sum(1);
> wordCounts.print();
> }
> public static class LineSplitter implements FlatMapFunction Tuple2> {
> @Override
> public void flatMap(String line, Collector Integer>> out) {
> for (String word : line.split(" ")) {
> out.collect(new Tuple2(word, 1));
> }
> }
> }
> public static class TestClass implements Serializable {
> private static final long serialVersionUID = -2932037991574118651L;
> List integerList;
> public TestClass(List integerList){
> this.integerList=integerList;
> }
> }
> {code}
> {noformat}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 
> 'DataSource (at main(Test.java:32) 
> (org.apache.flink.api.java.io.CollectionInputFormat))': Deserializing the 
> InputFormat ([mytests.Test$TestClass@4d6025c5]) failed: unread block data
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:523)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:507)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:507)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190)
> at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at 
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43)
> at 
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at 
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> 

[jira] [Reopened] (FLINK-2608) Arrays.asList(..) does not work with CollectionInputFormat

2017-01-18 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reopened FLINK-2608:


Have to reopen this because we can't update our Chill dependency without 
breaking savepoint compatability between 1.1 and 1.2.

> Arrays.asList(..) does not work with CollectionInputFormat
> --
>
> Key: FLINK-2608
> URL: https://issues.apache.org/jira/browse/FLINK-2608
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: 0.9, 0.10.0
>Reporter: Maximilian Michels
>Assignee: Alexander Chermenin
>Priority: Minor
> Fix For: 1.2.0
>
>
> When using Arrays.asList(..) as input for a CollectionInputFormat, the 
> serialization/deserialization fails when deploying the task.
> See the following program:
> {code:java}
> public class WordCountExample {
> public static void main(String[] args) throws Exception {
> final ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> DataSet text = env.fromElements(
> "Who's there?",
> "I think I hear them. Stand, ho! Who's there?");
> // DOES NOT WORK
> List elements = Arrays.asList(0, 0, 0);
> // The following works:
> //List elements = new ArrayList<>(new int[] {0,0,0});
> DataSet set = env.fromElements(new TestClass(elements));
> DataSet> wordCounts = text
> .flatMap(new LineSplitter())
> .withBroadcastSet(set, "set")
> .groupBy(0)
> .sum(1);
> wordCounts.print();
> }
> public static class LineSplitter implements FlatMapFunction Tuple2> {
> @Override
> public void flatMap(String line, Collector Integer>> out) {
> for (String word : line.split(" ")) {
> out.collect(new Tuple2(word, 1));
> }
> }
> }
> public static class TestClass implements Serializable {
> private static final long serialVersionUID = -2932037991574118651L;
> List integerList;
> public TestClass(List integerList){
> this.integerList=integerList;
> }
> }
> {code}
> {noformat}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 
> 'DataSource (at main(Test.java:32) 
> (org.apache.flink.api.java.io.CollectionInputFormat))': Deserializing the 
> InputFormat ([mytests.Test$TestClass@4d6025c5]) failed: unread block data
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:523)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:507)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:507)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190)
> at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at 
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43)
> at 
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at 
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> 

[jira] [Created] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient

2017-01-16 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5510:
--

 Summary: Replace Scala Future with FlinkFuture in 
QueryableStateClient
 Key: FLINK-5510
 URL: https://issues.apache.org/jira/browse/FLINK-5510
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Reporter: Ufuk Celebi
Priority: Minor


The entry point for queryable state users is the {{QueryableStateClient}} which 
returns query results via Scala Futures. Since merging the initial version of 
QueryableState we have introduced the FlinkFuture wrapper type in order to not 
expose our Scala dependency via the API.

Since APIs tend to stick around longer than expected, it might be worthwhile to 
port the exposed QueryableStateClient interface to use the FlinkFuture. Early 
users can still get the Scala Future via FlinkFuture#getScalaFuture().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5509) Replace QueryableStateClient keyHashCode argument

2017-01-16 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5509:
--

 Summary: Replace QueryableStateClient keyHashCode argument
 Key: FLINK-5509
 URL: https://issues.apache.org/jira/browse/FLINK-5509
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Reporter: Ufuk Celebi
Priority: Minor


When going over the low level QueryableStateClient with [~NicoK] we noticed 
that the key hashCode argument can be confusing to users:

{code}
Future getKvState(
  JobID jobId,
  String name,
  int keyHashCode,
  byte[] serializedKeyAndNamespace)
{code}

The {{keyHashCode}} argument is the result of calling {{hashCode()}} on the key 
to look up. This is what is send to the JobManager in order to look up the 
location of the key. While pretty straight forward, it is repetitive and 
possibly confusing.

As an alternative we suggest to make the method generic and simply call 
hashCode on the object ourselves. This way the user just provides the key 
object.

Since there are some early users of the queryable state API already, we would 
suggest to rename the method in order to provoke a compilation error after 
upgrading to the actually released 1.2 version.

(This would also work without renaming since the hashCode of Integer (what 
users currently provide) is the same number, but it would be confusing why it 
acutally works.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5507) remove queryable list state sink

2017-01-16 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823895#comment-15823895
 ] 

Ufuk Celebi commented on FLINK-5507:


+1

Would you like to do that and open a PR?

> remove queryable list state sink
> 
>
> Key: FLINK-5507
> URL: https://issues.apache.org/jira/browse/FLINK-5507
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> The queryable state "sink" using ListState 
> (".asQueryableState(, ListStateDescriptor)") stores all incoming data 
> forever and is never cleaned. Eventually, it will pile up too much memory and 
> is thus of limited use.
> We should remove it from the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5484) Kryo serialization changed between 1.1 and 1.2

2017-01-13 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5484:
--

 Summary: Kryo serialization changed between 1.1 and 1.2
 Key: FLINK-5484
 URL: https://issues.apache.org/jira/browse/FLINK-5484
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Ufuk Celebi


I think the way that Kryo serializes data changed between 1.1 and 1.2.

I have a generic Object that is serialized as part of a 1.1 savepoint that I 
cannot resume from with 1.2:

{code}
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Job execution failed.
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at 
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1486)
at com.dataartisans.DidKryoChange.main(DidKryoChange.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:900)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Could not initialize keyed state 
backend.
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:286)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:199)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:649)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:636)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:254)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: 
Unable to find class: f
at 

[jira] [Closed] (FLINK-5467) Stateless chained tasks set legacy operator state

2017-01-13 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5467.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in 
af244aa (release-1.2)
51a3573 (master).

> Stateless chained tasks set legacy operator state
> -
>
> Key: FLINK-5467
> URL: https://issues.apache.org/jira/browse/FLINK-5467
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
>
> I discovered this while trying to rescale a job with a Kafka source with a 
> chained stateless operator.
> Looking into it, it turns out that this fails, because the checkpointed state 
> contains legacy operator state for the chained operator although it is state 
> less.
> /cc [~aljoscha] You mentioned that this might be a possible duplicate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5477) Summary columns not aligned correctly

2017-01-13 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5477:
--

 Summary: Summary columns not aligned correctly
 Key: FLINK-5477
 URL: https://issues.apache.org/jira/browse/FLINK-5477
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


[~rmetzger] reported that the summary stats columns in the reworked checkpoint 
stats are not correctly aligned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5444) Flink UI uses absolute URLs.

2017-01-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5444.
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0

Fixed in
dff58df, 80f1517 (release-1.2),
42b53e6, d7e862a (master).

> Flink UI uses absolute URLs.
> 
>
> Key: FLINK-5444
> URL: https://issues.apache.org/jira/browse/FLINK-5444
> Project: Flink
>  Issue Type: Bug
>Reporter: Joerg Schad
>Assignee: Joerg Schad
> Fix For: 1.2.0, 1.3.0
>
>
> The Flink UI has a mixed use of absolute and relative links. See for example 
> [here](https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/web/index.html)
> {code:|borderStyle=solid}
>  sizes="16x16"> 
> 
> {code}
> When referencing the UI from another UI, e.g., the DC/OS UI relative links 
> are preffered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5466) Make production environment default in gulpfile

2017-01-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5466.
--
   Resolution: Fixed
 Assignee: Ufuk Celebi
Fix Version/s: 1.1.5
   1.3.0
   1.2.0

FIxed in
12cf5dc, 4ea52d6 (release-1.1),
e55d426, 624f8ae (release-1.2),
408f6ea, e1181f6 (master).

> Make production environment default in gulpfile
> ---
>
> Key: FLINK-5466
> URL: https://issues.apache.org/jira/browse/FLINK-5466
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.2.0, 1.1.4
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0, 1.1.5
>
>
> Currently the default environment set in our gulpfile is development, which 
> lead to very large created JS files. When building the web UI we apparently 
> forgot to set the environment to production (build via gulp production).
> Since this is likely to occur again, we should make the default environment 
> production and make sure to use development manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5467) Stateless chained tasks set legacy operator state

2017-01-12 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5467:
--

 Summary: Stateless chained tasks set legacy operator state
 Key: FLINK-5467
 URL: https://issues.apache.org/jira/browse/FLINK-5467
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi


I discovered this while trying to rescale a job with a Kafka source with a 
chained stateless operator.

Looking into it, it turns out that this fails, because the checkpointed state 
contains legacy operator state for the chained operator although it is state 
less.

/cc [~aljoscha] You mentioned that this might be a possible duplicate?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5466) Make production environment default in gulpfile

2017-01-12 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5466:
--

 Summary: Make production environment default in gulpfile
 Key: FLINK-5466
 URL: https://issues.apache.org/jira/browse/FLINK-5466
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Affects Versions: 1.1.4, 1.2.0
Reporter: Ufuk Celebi


Currently the default environment set in our gulpfile is development, which 
lead to very large created JS files. When building the web UI we apparently 
forgot to set the environment to production (build via gulp production).

Since this is likely to occur again, we should make the default environment 
production and make sure to use development manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5448) Fix typo in StateAssignmentOperation Exception

2017-01-11 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5448:
--

 Summary: Fix typo in StateAssignmentOperation Exception
 Key: FLINK-5448
 URL: https://issues.apache.org/jira/browse/FLINK-5448
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Trivial


{code}
Cannot restore the latest checkpoint because the operator 
cbc357ccb763df2852fee8c4fc7d55f2 has non-partitioned state and its parallelism 
changed. The operatorcbc357ccb763df2852fee8c4fc7d55f2 has parallelism 2 whereas 
the correspondingstate object has a parallelism of 4
{code}

White space is missing in some places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5440) Misleading error message when migrating and scaling down from savepoint

2017-01-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5440:
--

 Summary: Misleading error message when migrating and scaling down 
from savepoint
 Key: FLINK-5440
 URL: https://issues.apache.org/jira/browse/FLINK-5440
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Minor


When resuming from an 1.1 savepoint with 1.2 and reducing the parallelism (and 
correctly setting the max parallelism), the error message says something about 
a missing operator which is misleading. Restoring from the same savepoint with 
the savepoint parallelism works as expected.

Instead it should state that this kind of operation is not possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5439) Adjust max parallelism when migrating

2017-01-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5439:
--

 Summary: Adjust max parallelism when migrating
 Key: FLINK-5439
 URL: https://issues.apache.org/jira/browse/FLINK-5439
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Minor


When migrating from v1 savepoints which don't have the notion of a max 
parallelism, the job needs to explicitly set the max parallelism to the 
parallelism of the savepoint.

[~stefanrichte...@gmail.com] If this not trivially implemented, let's close 
this as won't fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5438) Typo in JobGraph generator Exception

2017-01-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5438:
--

 Summary: Typo in JobGraph generator Exception 
 Key: FLINK-5438
 URL: https://issues.apache.org/jira/browse/FLINK-5438
 Project: Flink
  Issue Type: Improvement
  Components: Client
Reporter: Ufuk Celebi
Priority: Trivial


When trying to run a job with parallelism  > max parallelism there is a typo in 
the error message:

{code}
Caused by: java.lang.IllegalStateException: The maximum parallelism (1) of the 
stream node Flat Map-3 is smaller than the parallelism (18). Increase the 
maximum parallelism or decrease the parallelism >>>ofthis<<< operator.
at 
org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobVertex(StreamingJobGraphGenerator.java:318)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5437) Make CheckpointedRestoring error message more detailed

2017-01-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5437:
--

 Summary: Make CheckpointedRestoring error message more detailed
 Key: FLINK-5437
 URL: https://issues.apache.org/jira/browse/FLINK-5437
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Minor


When restoring Checkpointed state without implementing CheckpointedRestoring, 
the job fails with the following Exception:
{code}
java.lang.Exception: Found UDF state but operator is not instance of 
CheckpointedRestoring
{code}

I think we should make this error message more detailed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5436) UDF state without CheckpointedRestoring can result in restarting loop

2017-01-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5436:
--

 Summary: UDF state without CheckpointedRestoring can result in 
restarting loop
 Key: FLINK-5436
 URL: https://issues.apache.org/jira/browse/FLINK-5436
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Priority: Minor


When restoring a job with Checkpointed state and not implementing the new 
CheckpointedRestoring interface, the job will be restarted over and over again 
(given the respective restarting strategy).

Since this is not recoverable, we should immediately fail the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5102) Connection establishment does not react to interrupt

2017-01-10 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5102.
--
Resolution: Cannot Reproduce

> Connection establishment does not react to interrupt
> 
>
> Key: FLINK-5102
> URL: https://issues.apache.org/jira/browse/FLINK-5102
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0
>
>
> Interrupting a connection establishment does not to react to interrupts.
> {code}
> Task - Task '... (60/120)' did not react to cancelling signal, but is stuck 
> in method:
> java.lang.Object.$$YJP$$wait(Native Method)
> java.lang.Object.wait(Object.java)
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:191)
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:131)
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:83)
> org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:60)
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:118)
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:395)
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:414)
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:152)
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:195)
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:67)
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:638)
> java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab

2017-01-10 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5306.
--
   Resolution: Implemented
Fix Version/s: 1.3.0
   1.2.0

Implemented in 579bc96, dec0d6b, 27df801 (master), 0d1f4bc, 1fd2d2e, 3f2e996 
(release-1.2).

> Display checkpointing configuration details in web UI "Configuration" tab
> -
>
> Key: FLINK-5306
> URL: https://issues.apache.org/jira/browse/FLINK-5306
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.1.3
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0
>
>
> I wanted to check the checkpointing mode from the web UI, but its not 
> displayed there.
> I think there are quite some job-wide settings we can show there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4410) Report more information about operator checkpoints

2017-01-10 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4410.
--
   Resolution: Implemented
Fix Version/s: 1.3.0

All sub tasks have been implemented.

> Report more information about operator checkpoints
> --
>
> Key: FLINK-4410
> URL: https://issues.apache.org/jira/browse/FLINK-4410
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Webfrontend
>Affects Versions: 1.1.2
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0
>
>
> Checkpoint statistics contain the duration of a checkpoint, measured as from 
> the CheckpointCoordinator's start to the point when the acknowledge message 
> came.
> We should additionally expose
>   - duration of the synchronous part of a checkpoint
>   - duration of the asynchronous part of a checkpoint
>   - number of bytes buffered during the stream alignment phase
>   - duration of the stream alignment phase
> Note: In the case of using *at-least once* semantics, the latter two will 
> always be zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4698) Visualize additional checkpoint information

2017-01-10 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4698.
--
   Resolution: Implemented
Fix Version/s: 1.3.0

Implemented in 27df801 (master), 3f2e996 (release-1.2).

> Visualize additional checkpoint information
> ---
>
> Key: FLINK-4698
> URL: https://issues.apache.org/jira/browse/FLINK-4698
> Project: Flink
>  Issue Type: Sub-task
>  Components: Webfrontend
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0
>
>
> Display the additional information gathered in the {{CheckpointStatsTracker}} 
> in the "Checkpoint" tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4697) Gather more detailed checkpoint stats in CheckpointStatsTracker

2017-01-10 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4697.
--
   Resolution: Implemented
Fix Version/s: 1.3.0

Implemented in dec0d6b (master), 1fd2d2e (release-1.2).

> Gather more detailed checkpoint stats in CheckpointStatsTracker
> ---
>
> Key: FLINK-4697
> URL: https://issues.apache.org/jira/browse/FLINK-4697
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.3.0
>
>
> The additional information attached to the {{AcknowledgeCheckpoint}} method 
> must be gathered in the {{CheckpointStatsTracker}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5066) add more efficient isEvent check to EventSerializer

2017-01-09 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5066.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Implemented in 9457465 (master).

> add more efficient isEvent check to EventSerializer
> ---
>
> Key: FLINK-5066
> URL: https://issues.apache.org/jira/browse/FLINK-5066
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
> Fix For: 1.3.0
>
>
> -LocalInputChannel#getNextBuffer de-serialises all incoming events on the 
> lookout for an EndOfPartitionEvent.-
> Some buffer code de-serialises all incoming events on the lookout for an 
> EndOfPartitionEvent
> (now applies to PartitionRequestQueue#isEndOfPartitionEvent()).
> Instead, if EventSerializer offered a function to check for an event type 
> only without de-serialising the whole event, we could save some resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5059) only serialise events once in RecordWriter#broadcastEvent

2017-01-09 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5059.
--
   Resolution: Implemented
Fix Version/s: 1.3.0

Fixed in 9cff8c9 (master).

> only serialise events once in RecordWriter#broadcastEvent
> -
>
> Key: FLINK-5059
> URL: https://issues.apache.org/jira/browse/FLINK-5059
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
> Fix For: 1.3.0
>
>
> Currently, 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter#broadcastEvent 
> serialises the event once per target channel. Instead, it could serialise the 
> event only once and use the serialised form for every channel and thus save 
> resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5428) Decrease restart delay in RecoveryITCase

2017-01-09 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5428:
--

 Summary: Decrease restart delay in RecoveryITCase 
 Key: FLINK-5428
 URL: https://issues.apache.org/jira/browse/FLINK-5428
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Reporter: Ufuk Celebi
Priority: Minor


RecoveryITCase takes around 30 seconds to run, because it uses the default 
restart delay of 10 seconds for each of its three tests. This can be sped up by 
decreasing the restart delay in the test.

We would have to check whether the tests are sensitive to this. If not, it's 
simply a setting in the ExecutionConfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4131) Confusing error for out dated RequestPartitionState

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4131.
--
Resolution: Duplicate

> Confusing error for out dated RequestPartitionState
> ---
>
> Key: FLINK-4131
> URL: https://issues.apache.org/jira/browse/FLINK-4131
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> When a consumer requests a partition state for an old job or execution 
> attempt (e.g. failed job or attempt), the JobManager answers with a {{null}} 
> state, which fails the requesting task with the following cause: 
> {{IllegalStateException("Received unexpected partition state null for 
> partition request. This is a bug.")}}.
> This is confusing to the user as one might think that this is the root 
> failure cause.
> I propose to either ignore the null state at the Task or not respond on the 
> JobManager side if the job or execution attempt has been cleared (see 
> {{RequestPartitionState}} in {{JobManager.scala}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5180) Include blocked on bounded queue length in back pressure stats

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5180.
--
Resolution: Invalid

> Include blocked on bounded queue length in back pressure stats
> --
>
> Key: FLINK-5180
> URL: https://issues.apache.org/jira/browse/FLINK-5180
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> As a follow up to FLINK-5088, we need to adjust the back pressure stats 
> tracker to report back pressure when the task is blocked on the introduced 
> capacity limit. Currently, only blocking buffer requests are accounted for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4651) Re-register processing time timers at the WindowOperator upon recovery.

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-4651:
---
Fix Version/s: (was: 1.1.4)
   1.1.5

> Re-register processing time timers at the WindowOperator upon recovery.
> ---
>
> Key: FLINK-4651
> URL: https://issues.apache.org/jira/browse/FLINK-4651
> Project: Flink
>  Issue Type: Bug
>  Components: Windowing Operators
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>  Labels: windows
> Fix For: 1.2.0, 1.1.5
>
>
> Currently the {{WindowOperator}} checkpoints the processing time timers, but 
> upon recovery it does not re-registers them with the {{TimeServiceProvider}}. 
> To actually reprocess them it relies on another element that will come and 
> register a new timer for a future point in time. Although this is a realistic 
> assumption in long running jobs, we can remove this assumption by 
> re-registering the restored timers with the {{TimeServiceProvider}} in the 
> {{open()}} method of the {{WindowOperator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5322:
---
Fix Version/s: (was: 1.1.4)
   1.1.5

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.2.0, 1.1.3
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.2.0, 1.1.5
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable; instead it should be 
> described similarly as the application-master entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5210) Failing performCheckpoint method causes task to fail

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5210:
---
Fix Version/s: (was: 1.1.4)
   1.1.5

> Failing performCheckpoint method causes task to fail 
> -
>
> Key: FLINK-5210
> URL: https://issues.apache.org/jira/browse/FLINK-5210
> Project: Flink
>  Issue Type: Bug
>  Components: TaskManager
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.5
>
>
> A failure in {{StreamTask#performCheckpoint}} causes the {{Task}} to fail. 
> This should not be the case and instead the checkpoint files should be 
> cleaned up and the current checkpoint should be declined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5227) Add warning to include flink-table in job fat jars

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5227:
---
Fix Version/s: (was: 1.1.4)
   1.1.5

> Add warning to include flink-table in job fat jars
> --
>
> Key: FLINK-5227
> URL: https://issues.apache.org/jira/browse/FLINK-5227
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Table API & SQL
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Fabian Hueske
> Fix For: 1.2.0, 1.1.5
>
>
> {{flink-table}} depends on Apache Calcite which includes a JDBC Driver that 
> prevents classloaders from being collected. This is a known issue with 
> {{java.sqlDriverManager}} and can eventually cause OOME Permgen Taskmanager 
> failures.
> The current workaround is to not include {{flink-table}} in the fat job JAR. 
> Instead the {{flink-table}} jar files should be added to the {{lib}} folder 
> of the TaskManagers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5302) Log failure cause at Execution

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5302:
---
Fix Version/s: (was: 1.1.4)
   (was: 1.2.0)

> Log failure cause at Execution 
> ---
>
> Key: FLINK-5302
> URL: https://issues.apache.org/jira/browse/FLINK-5302
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ufuk Celebi
>
> It can be helpful to log the failure cause that made an {{Execution}} switch 
> to state {{FAILED}}. We currently only see a "root cause" logged on the 
> JobManager, which happens to be the first failure cause that makes it to 
> {{ExecutionGraph#fail()}}. This depends on relative timings of messages. For 
> debugging it can be helpful to have all causes available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5233) Upgrade Jackson version because of class loader leak

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5233:
---
Fix Version/s: (was: 1.1.4)

> Upgrade Jackson version because of class loader leak
> 
>
> Key: FLINK-5233
> URL: https://issues.apache.org/jira/browse/FLINK-5233
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>
> A user reported this issue on the ML:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-1-3-OOME-Permgen-td10379.html
> I propose to upgrade to Jackson 2.7.8, as this version contains the fix for 
> the issue, but its not a major jackson upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5375) Fix watermark documentation

2016-12-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5375:
---
Fix Version/s: (was: 1.1.4)
   1.1.5

> Fix watermark documentation
> ---
>
> Key: FLINK-5375
> URL: https://issues.apache.org/jira/browse/FLINK-5375
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Project Website
>Affects Versions: 1.2.0, 1.1.3, 1.3.0
>Reporter: Fabian Hueske
>Priority: Critical
> Fix For: 1.2.0, 1.3.0, 1.1.5
>
>
> The [documentation of 
> watermarks|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html#event-time-and-watermarks]
>  is not correct. It states 
> {quote}
> A Watermark(t) declares that event time has reached time t in that stream, 
> meaning that all events with a timestamps t’ < t have occurred.
> {quote}
> whereas the JavaDocs which is aligned with implementation says
> {quote}
> A Watermark tells operators that receive it that no elements with a
> timestamp older or equal to the watermark timestamp should arrive at the
> operator.
> {quote}
> The documentation needs to be updated. Moreover, we need to carefully check 
> that the watermark semantics are correctly described in other pages of the 
> documentation and blog posts published on the Flink website.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5377) Improve savepoint docs

2016-12-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5377:
--

 Summary: Improve savepoint docs
 Key: FLINK-5377
 URL: https://issues.apache.org/jira/browse/FLINK-5377
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.1.3, 1.2.0
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


The savepoint docs are very detailed and focus on the internals. They should 
better convey what users have to take care of.

The following questions should be answered:
What happens if I add a new operator that requires state to my flow?
What happens if I delete an operator that has state to my flow?
What happens if I reorder stateful operators in my flow?
What happens if I add or delete or reorder operators that have no state in my 
flow?
Should I apply .uid to all operators in my flow?
Should I apply .uid to only the operators that have state?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5352) Restore RocksDB 1.1.3 memory behavior

2016-12-16 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5352.
--
Resolution: Fixed

Fixed in a8b415f, a874ad8 (release-1.1).

> Restore RocksDB 1.1.3 memory behavior
> -
>
> Key: FLINK-5352
> URL: https://issues.apache.org/jira/browse/FLINK-5352
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.4
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.1.4
>
>
> The current Release Candidates for 1.1.4 use an updated RocksDB version with 
> a changed memory footprint, which makes setups unstable that use a 1.1.3 
> configuration.
> We should change the "predefined options" in the RocksDB state backend to 
> reflect the 1.1.3 settings for the 1.1.4 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5326) IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available

2016-12-13 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5326.
--
   Resolution: Fixed
Fix Version/s: 1.1.4
   1.2.0

Fixed in 04db15a, 9ed7752 (release-1.1) and d965d5a, fc62723 (master).

> IllegalStateException: Bug in Netty consumer logic: reader queue got notified 
> by partition about available data,  but none was available
> 
>
> Key: FLINK-5326
> URL: https://issues.apache.org/jira/browse/FLINK-5326
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.2.0, 1.1.4
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>  Labels: qa
> Fix For: 1.2.0, 1.1.4
>
>
> {code}
> 2016-12-10 23:56:39,056 DEBUG 
> org.apache.flink.runtime.io.network.partition.ResultPartition  - Source: 
> control events generator (1/40) (3360ced43a57fed83904f22e93281ce0): Releasing 
> ResultPartition 
> e585300594a68036b0983cefaf048e17@3360ced43a57fed83904f22e93281ce0 [PIPELINED, 
> 1 subpartitions, 0 pending references].
> 2016-12-10 23:56:39,056 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - dynamic filter (1/40) (b1b7284e0b4a6ba08a16c50dcf13ff0d) 
> switched from RUNNING to FAILED.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Fatal error at remote task manager 
> 'permanent-qa-cluster-2wv1/10.240.0.27:45062'.
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:229)
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:164)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Bug in Netty consumer logic: 
> reader queue got notified by partition about available data, but none was 
> available.
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:177)
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:111)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> 

[jira] [Commented] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails

2016-12-13 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745056#comment-15745056
 ] 

Ufuk Celebi commented on FLINK-5328:


The attached logs also show this: 
{code}
12:00:31,315 ERROR 
org.apache.flink.runtime.iterative.task.AbstractIterativeTask  - Error while 
shutting down an iterative operator.
java.lang.NullPointerException
at java.util.ArrayList.addAll(ArrayList.java:559)
at 
org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.close(ReOpenableMutableHashTable.java:184)
at 
org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashJoinIterator.close(NonReusingBuildSecondHashJoinIterator.java:103)
at 
org.apache.flink.runtime.operators.AbstractCachedBuildSideJoinDriver.teardown(AbstractCachedBuildSideJoinDriver.java:213)
at 
org.apache.flink.runtime.iterative.task.AbstractIterativeTask.closeLocalStrategiesAndCaches(AbstractIterativeTask.java:164)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:359)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
at java.lang.Thread.run(Thread.java:745)
{code}

> ConnectedComponentsITCase testJobWithoutObjectReuse fails
> -
>
> Key: FLINK-5328
> URL: https://issues.apache.org/jira/browse/FLINK-5328
> Project: Flink
>  Issue Type: Test
>Reporter: Ufuk Celebi
>  Labels: test-stability
> Attachments: connectedComponentsFailure.txt
>
>
> I've seen this fail a couple of times now: 
> ConnectedComponentsITCase#testJobWithoutObjectReuse.
> {code}
> Job execution failed.
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.IOException: Stream Closed
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:272)
>   at 
> org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72)
>   at 
> org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522)
>   at 
> org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78)
>   at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Complete logs are attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails

2016-12-13 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5328:
---
Attachment: connectedComponentsFailure.txt

> ConnectedComponentsITCase testJobWithoutObjectReuse fails
> -
>
> Key: FLINK-5328
> URL: https://issues.apache.org/jira/browse/FLINK-5328
> Project: Flink
>  Issue Type: Test
>Reporter: Ufuk Celebi
>  Labels: test-stability
> Attachments: connectedComponentsFailure.txt
>
>
> I've seen this fail a couple of times now: 
> ConnectedComponentsITCase#testJobWithoutObjectReuse.
> {code}
> Job execution failed.
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.IOException: Stream Closed
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:272)
>   at 
> org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72)
>   at 
> org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522)
>   at 
> org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78)
>   at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Complete logs are attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails

2016-12-13 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5328:
--

 Summary: ConnectedComponentsITCase testJobWithoutObjectReuse fails
 Key: FLINK-5328
 URL: https://issues.apache.org/jira/browse/FLINK-5328
 Project: Flink
  Issue Type: Test
Reporter: Ufuk Celebi


I've seen this fail a couple of times now: 
ConnectedComponentsITCase#testJobWithoutObjectReuse.

{code}
Job execution failed.
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Stream Closed
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at 
org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72)
at 
org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59)
at 
org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662)
at 
org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556)
at 
org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522)
at 
org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78)
at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
at java.lang.Thread.run(Thread.java:745)
{code}

Complete logs are attached.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5326) IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available

2016-12-13 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5326:
---
Affects Version/s: 1.2.0

> IllegalStateException: Bug in Netty consumer logic: reader queue got notified 
> by partition about available data,  but none was available
> 
>
> Key: FLINK-5326
> URL: https://issues.apache.org/jira/browse/FLINK-5326
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.2.0, 1.1.4
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>  Labels: qa
>
> {code}
> 2016-12-10 23:56:39,056 DEBUG 
> org.apache.flink.runtime.io.network.partition.ResultPartition  - Source: 
> control events generator (1/40) (3360ced43a57fed83904f22e93281ce0): Releasing 
> ResultPartition 
> e585300594a68036b0983cefaf048e17@3360ced43a57fed83904f22e93281ce0 [PIPELINED, 
> 1 subpartitions, 0 pending references].
> 2016-12-10 23:56:39,056 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - dynamic filter (1/40) (b1b7284e0b4a6ba08a16c50dcf13ff0d) 
> switched from RUNNING to FAILED.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Fatal error at remote task manager 
> 'permanent-qa-cluster-2wv1/10.240.0.27:45062'.
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:229)
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:164)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Bug in Netty consumer logic: 
> reader queue got notified by partition about available data, but none was 
> available.
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:177)
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:111)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> 

[jira] [Commented] (FLINK-5326) IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available

2016-12-13 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744744#comment-15744744
 ] 

Ufuk Celebi commented on FLINK-5326:


No, 1.2.0 too.

> IllegalStateException: Bug in Netty consumer logic: reader queue got notified 
> by partition about available data,  but none was available
> 
>
> Key: FLINK-5326
> URL: https://issues.apache.org/jira/browse/FLINK-5326
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.2.0, 1.1.4
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>  Labels: qa
>
> {code}
> 2016-12-10 23:56:39,056 DEBUG 
> org.apache.flink.runtime.io.network.partition.ResultPartition  - Source: 
> control events generator (1/40) (3360ced43a57fed83904f22e93281ce0): Releasing 
> ResultPartition 
> e585300594a68036b0983cefaf048e17@3360ced43a57fed83904f22e93281ce0 [PIPELINED, 
> 1 subpartitions, 0 pending references].
> 2016-12-10 23:56:39,056 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - dynamic filter (1/40) (b1b7284e0b4a6ba08a16c50dcf13ff0d) 
> switched from RUNNING to FAILED.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Fatal error at remote task manager 
> 'permanent-qa-cluster-2wv1/10.240.0.27:45062'.
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:229)
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:164)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Bug in Netty consumer logic: 
> reader queue got notified by partition about available data, but none was 
> available.
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:177)
> at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:111)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294)
> at 
> 

[jira] [Closed] (FLINK-5007) Retain externalized checkpoint on suspension

2016-12-13 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5007.
--
Resolution: Fixed

Fixed in 38ab716 (master).

> Retain externalized checkpoint on suspension
> 
>
> Key: FLINK-5007
> URL: https://issues.apache.org/jira/browse/FLINK-5007
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0
>
>
> Externalized checkpoints are cleaned up when the job is suspended. 
> Suspensions happen on graceful shut down (non-HA) or loss of leadership (HA).
> In case of HA, the checkpoint store does not clean up any checkpoints as they 
> might be recovered by a new leader. The only way to stop a HA job is to 
> actually cancel it. Therefore the configured clean up behaviour doesn't 
> matter.
> In case of non-HA, suspensions happen because of graceful shut down (for 
> example stopping a YARN session). In this case I would treat the clean up 
> behaviour similar to cancelling the job.
> {code}
> ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION => delete on suspension
> ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION => retain on suspension
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5088) Add option to limit subpartition queue length

2016-12-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5088.
--
Resolution: Won't Fix

> Add option to limit subpartition queue length
> -
>
> Key: FLINK-5088
> URL: https://issues.apache.org/jira/browse/FLINK-5088
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Affects Versions: 1.1.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Currently the sub partition queues are not bounded. Queued buffers are 
> consumed by the network event loop or local consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5114) PartitionState update with finished execution fails

2016-12-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5114.
--
   Resolution: Fixed
Fix Version/s: 1.1.4
   1.2.0

Fixed in a078666 (master), 2b612f2 (release-1.1)

> PartitionState update with finished execution fails
> ---
>
> Key: FLINK-5114
> URL: https://issues.apache.org/jira/browse/FLINK-5114
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.1.4
>
>
> If a partition state request is triggered for a producer that finishes before 
> the request arrives, the execution is unregistered and the producer cannot be 
> found. In this case the PartitionState returns null and the job fails.
> We need to check the producer location via the intermediate result partition 
> in this case.
> See here: https://api.travis-ci.org/jobs/177668505/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator

2016-12-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5264:
---
Priority: Major  (was: Blocker)

> Improve error message when assigning uid to intermediate operator
> -
>
> Key: FLINK-5264
> URL: https://issues.apache.org/jira/browse/FLINK-5264
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Kostas Kloudas
>Assignee: Ufuk Celebi
> Fix For: 1.2.0
>
>
> Currently when trying to assign uid to an intermediate operator in a chain, 
> the error message just says that it is not the right place, without any 
> further information.
> We could improve the message by telling explicitly to the user on which 
> operator to assign the uid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4797) Add Flink 1.1 savepoint backwards compatability

2016-12-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4797.
--
Resolution: Duplicate

> Add Flink 1.1 savepoint backwards compatability
> ---
>
> Key: FLINK-4797
> URL: https://issues.apache.org/jira/browse/FLINK-4797
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Make sure that we can resume from Flink 1.1 savepoints on the job manager 
> side. This means that we need to be able to read the savepoint header file on 
> the job manager and create a completed checkpoint from it.
> This will not yet mean that we can resume from 1.1 savepoints as the  
> operator/task manager side compatability is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5041) Implement savepoint backwards compatibility 1.1 -> 1.2

2016-12-12 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15741496#comment-15741496
 ] 

Ufuk Celebi commented on FLINK-5041:


This issue/sub tasks of this issue duplicate FLINK-4797. Should have been 
closed before creating these.

> Implement savepoint backwards compatibility 1.1 -> 1.2
> --
>
> Key: FLINK-5041
> URL: https://issues.apache.org/jira/browse/FLINK-5041
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> This issue tracks the implementation of backwards compatibility between Flink 
> 1.1 and 1.2 releases.
> This task subsumes:
> - Converting old savepoints to new savepoints, including a conversion of 
> state handles to their new replacement.
> - Converting keyed state from old backend implementations to their new 
> counterparts.
> - Converting operator and function state for all changed operators.
> - Ensure backwards compatibility of the hashes used to generate JobVertexIds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab

2016-12-09 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735100#comment-15735100
 ] 

Ufuk Celebi commented on FLINK-5306:


No, didn't open an issue yet, but I'll make this a sub task.

> Display checkpointing configuration details in web UI "Configuration" tab
> -
>
> Key: FLINK-5306
> URL: https://issues.apache.org/jira/browse/FLINK-5306
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.1.3
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>
> I wanted to check the checkpointing mode from the web UI, but its not 
> displayed there.
> I think there are quite some job-wide settings we can show there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab

2016-12-09 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735071#comment-15735071
 ] 

Ufuk Celebi commented on FLINK-5306:


I planned to make this part of the stats refactoring. Stay tuned... ;)

> Display checkpointing configuration details in web UI "Configuration" tab
> -
>
> Key: FLINK-5306
> URL: https://issues.apache.org/jira/browse/FLINK-5306
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.1.3
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>
> I wanted to check the checkpointing mode from the web UI, but its not 
> displayed there.
> I think there are quite some job-wide settings we can show there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab

2016-12-09 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5306:
--

Assignee: Ufuk Celebi

> Display checkpointing configuration details in web UI "Configuration" tab
> -
>
> Key: FLINK-5306
> URL: https://issues.apache.org/jira/browse/FLINK-5306
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.1.3
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>
> I wanted to check the checkpointing mode from the web UI, but its not 
> displayed there.
> I think there are quite some job-wide settings we can show there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5302) Log failure cause at Execution

2016-12-08 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5302:
--

 Summary: Log failure cause at Execution 
 Key: FLINK-5302
 URL: https://issues.apache.org/jira/browse/FLINK-5302
 Project: Flink
  Issue Type: Improvement
Reporter: Ufuk Celebi
 Fix For: 1.2.0, 1.1.4


It can be helpful to log the failure cause that made an {{Execution}} switch to 
state {{FAILED}}. We currently only see a "root cause" logged on the 
JobManager, which happens to be the first failure cause that makes it to 
{{ExecutionGraph#fail()}}. This depends on relative timings of messages. For 
debugging it can be helpful to have all causes available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3902) Discarded FileSystem checkpoints are lingering around

2016-12-08 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3902.
--
Resolution: Not A Bug

This is actually an artifact of the way we try to delete fiiles and how this 
interacts with HDFS (which logs this as an Exception).

> Discarded FileSystem checkpoints are lingering around
> -
>
> Key: FLINK-3902
> URL: https://issues.apache.org/jira/browse/FLINK-3902
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.0.2
>Reporter: Ufuk Celebi
>
> A user reported that checkpoints with {{FSStateBackend}} are not properly 
> cleaned up.
> {code}
> 2016-05-10 12:21:06,559 INFO BlockStateChange: BLOCK* addToInvalidates: 
> blk_1084791727_11053122 10.10.113.10:50010
> 2016-05-10 12:21:06,559 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from 
> 10.10.113.9:49233 Call#12337 Retry#0
> org.apache.hadoop.fs.PathIsNotEmptyDirectoryException: 
> `/flink/checkpoints_test/570d6e67d571c109daab468e5678402b/chk-62 is non 
> empty': Directory is not empty
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3712)
> {code}
> {code}
> 2016-05-10 12:20:22,636 [Checkpoint Timer] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
> checkpoint 62 @ 1462875622636
> 2016-05-10 12:20:32,507 [flink-akka.actor.default-dispatcher-240088] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed 
> checkpoint 62 (in 9843 ms)
> 2016-05-10 12:20:52,637 [Checkpoint Timer] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
> checkpoint 63 @ 1462875652637
> 2016-05-10 12:21:06,563 [flink-akka.actor.default-dispatcher-240028] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed 
> checkpoint 63 (in 13909 ms)
> 2016-05-10 12:21:22,636 [Checkpoint Timer] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
> checkpoint 64 @ 1462875682636
> {code}
> Running the same program with the {{RocksDBBackend}} works as expected and 
> clears the old checkpoints properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5278) Improve Task and checkpoint logging

2016-12-07 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729286#comment-15729286
 ] 

Ufuk Celebi commented on FLINK-5278:


Merged in b046038 (release-1.1). Merge for master pending.

> Improve Task and checkpoint logging 
> 
>
> Key: FLINK-5278
> URL: https://issues.apache.org/jira/browse/FLINK-5278
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, TaskManager
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Minor
> Fix For: 1.2.0, 1.1.4
>
>
> The logging of task and checkpoint logic could be improved to contain more 
> information relevant for debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5279) Improve error message when trying to access keyed state in non-keyed operator

2016-12-07 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5279:
--

 Summary: Improve error message when trying to access keyed state 
in non-keyed operator
 Key: FLINK-5279
 URL: https://issues.apache.org/jira/browse/FLINK-5279
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.1.3
Reporter: Ufuk Celebi


When trying to access keyed state in a non-keyed operator, the error message is 
not very helpful. You get a trace like this:
{code}
java.lang.RuntimeException: Error while getting state
...
Caused by: java.lang.RuntimeException: State key serializer has not been 
configured in the config. This operation cannot use partitioned state.
{code}

It will be helpful to users if this is more explicit to users, stating that the 
API can only be used on keyed streams, etc.

If this applies to the current master as well, we should fix it there, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5274) LocalInputChannel throws NPE if partition reader is released

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5274.
--
   Resolution: Fixed
Fix Version/s: 1.1.4
   1.2.0

555a687 (master), 1b472d2 (release-1.1)

> LocalInputChannel throws NPE if partition reader is released
> 
>
> Key: FLINK-5274
> URL: https://issues.apache.org/jira/browse/FLINK-5274
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.1.4
>
>
> Reported by [~rmetzger]:
> {code}
> java.lang.NullPointerException
>at 
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:185)
>at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:444)
>at 
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:152)
>at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:195)
>at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:67)
>at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
>at org.apache.flink.runtime.taskmanager.Task.run(Task.java:638)
>at java.lang.Thread.run(Thread.java:745)
> {code}
> This is most likely caused by the result partition being released after the 
> notifcations have happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5275) InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5275.
--
   Resolution: Fixed
Fix Version/s: 1.1.4
   1.2.0

4410c04 (master), 4526005 (release-1.1)

> InputChanelDeploymentDescriptors throws misleading Exception if producer 
> failed/cancelled
> -
>
> Key: FLINK-5275
> URL: https://issues.apache.org/jira/browse/FLINK-5275
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.2.0, 1.1.4
>
>
> Reported by [~rmetzger]. While creating the input channel deployment 
> descriptors for a job that does not allow lazy deployment (all streaming 
> jobs), Robert got an {{ExecutionGraphException}} with the message {{Trying to 
> eagerly schedule a task whose inputs are not ready}} in the 
> {{InputChannelDeploymentDescriptor}}.
> This was not reported as the root cause and is most likely due to a failed 
> producer. For such cases, it makes sense to include the execution state of 
> the producing task so that users see that this is related to the producer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5276.
--
Resolution: Fixed

Fixed in 75b48e (release-1.1)

> ExecutionVertex archiving can throw NPE with many previous attempts
> ---
>
> Key: FLINK-5276
> URL: https://issues.apache.org/jira/browse/FLINK-5276
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.1.4
>
>
> I saw a NPE while archiving a ExecutionVertex:
> {code}
> execution graph 
> org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving.
> java.lang.NullPointerException: null
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
> ...
> {code}
> I think this is due to the newly introduced {{EvictingBoundedList}} which 
> returns a default element ({{null}})) for evicted elements when iterating 
> over it. 
> This affects the backport to release-1.1 only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5276:
--

Assignee: Ufuk Celebi  (was: Stefan Richter)

> ExecutionVertex archiving can throw NPE with many previous attempts
> ---
>
> Key: FLINK-5276
> URL: https://issues.apache.org/jira/browse/FLINK-5276
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.1.4
>
>
> I saw a NPE while archiving a ExecutionVertex:
> {code}
> execution graph 
> org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving.
> java.lang.NullPointerException: null
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
> ...
> {code}
> I think this is due to the newly introduced {{EvictingBoundedList}} which 
> returns a default element ({{null}})) for evicted elements when iterating 
> over it. 
> This affects the backport to release-1.1 only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5276:
---
Fix Version/s: (was: 1.2.0)

> ExecutionVertex archiving can throw NPE with many previous attempts
> ---
>
> Key: FLINK-5276
> URL: https://issues.apache.org/jira/browse/FLINK-5276
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Ufuk Celebi
>Assignee: Stefan Richter
> Fix For: 1.1.4
>
>
> I saw a NPE while archiving a ExecutionVertex:
> {code}
> execution graph 
> org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving.
> java.lang.NullPointerException: null
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
> ...
> {code}
> I think this is due to the newly introduced {{EvictingBoundedList}} which 
> returns a default element ({{null}})) for evicted elements when iterating 
> over it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5276:
---
Description: 
I saw a NPE while archiving a ExecutionVertex:
{code}
execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 
for archiving.
java.lang.NullPointerException: null
at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
...
{code}

I think this is due to the newly introduced {{EvictingBoundedList}} which 
returns a default element ({{null}})) for evicted elements when iterating over 
it. 

This affects the backport to release-1.1 only.

  was:
I saw a NPE while archiving a ExecutionVertex:
{code}
execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 
for archiving.
java.lang.NullPointerException: null
at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
...
{code}

I think this is due to the newly introduced {{EvictingBoundedList}} which 
returns a default element ({{null}})) for evicted elements when iterating over 
it. 


> ExecutionVertex archiving can throw NPE with many previous attempts
> ---
>
> Key: FLINK-5276
> URL: https://issues.apache.org/jira/browse/FLINK-5276
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Ufuk Celebi
>Assignee: Stefan Richter
> Fix For: 1.1.4
>
>
> I saw a NPE while archiving a ExecutionVertex:
> {code}
> execution graph 
> org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving.
> java.lang.NullPointerException: null
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
> ...
> {code}
> I think this is due to the newly introduced {{EvictingBoundedList}} which 
> returns a default element ({{null}})) for evicted elements when iterating 
> over it. 
> This affects the backport to release-1.1 only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5041) Implement savepoint backwards compatibility 1.1 -> 1.2

2016-12-07 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728721#comment-15728721
 ] 

Ufuk Celebi commented on FLINK-5041:


Subtasks FLINK-5042 and FLINK-5043 implemented in in af3bf83 (master).

> Implement savepoint backwards compatibility 1.1 -> 1.2
> --
>
> Key: FLINK-5041
> URL: https://issues.apache.org/jira/browse/FLINK-5041
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> This issue tracks the implementation of backwards compatibility between Flink 
> 1.1 and 1.2 releases.
> This task subsumes:
> - Converting old savepoints to new savepoints, including a conversion of 
> state handles to their new replacement.
> - Converting keyed state from old backend implementations to their new 
> counterparts.
> - Converting operator and function state for all changed operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5043) Converting keyed state from Flink 1.1 backend implementations to their new counterparts in 1.2

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5043.
--
   Resolution: Implemented
Fix Version/s: 1.2.0

Implemented in af3bf83 (master).

> Converting keyed state from Flink 1.1 backend implementations to their new 
> counterparts in 1.2
> --
>
> Key: FLINK-5043
> URL: https://issues.apache.org/jira/browse/FLINK-5043
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stefan Richter
>Assignee: Stefan Richter
> Fix For: 1.2.0
>
>
> Keyed state backends became keygroup aware in Flink 1.2 and their hierarchy 
> as a whole changed significantly. We need to implement a conversion so that 
> old snapshots can be restored into new backends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5042) Convert old savepoints to new savepoints

2016-12-07 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5042.
--
   Resolution: Implemented
Fix Version/s: 1.2.0

Implemented in af3bf83 (master).

> Convert old savepoints to new savepoints
> 
>
> Key: FLINK-5042
> URL: https://issues.apache.org/jira/browse/FLINK-5042
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stefan Richter
>Assignee: Stefan Richter
> Fix For: 1.2.0
>
>
> The format of savepoints and the hierarchy of state handles changed a lot 
> between Flink 1.1 and 1.2. For backwards compatibility, we need to convert 
> old to new savepoints.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts

2016-12-07 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5276:
--

 Summary: ExecutionVertex archiving can throw NPE with many 
previous attempts
 Key: FLINK-5276
 URL: https://issues.apache.org/jira/browse/FLINK-5276
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Reporter: Ufuk Celebi
Assignee: Stefan Richter
 Fix For: 1.2.0, 1.1.4


I saw a NPE while archiving a ExecutionVertex:
{code}
execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 
for archiving.
java.lang.NullPointerException: null
at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042)
...
{code}

I think this is due to the newly introduced {{EvictingBoundedList}} which 
returns a default element ({{null}})) for evicted elements when iterating over 
it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5275) InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled

2016-12-07 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5275:
--

 Summary: InputChanelDeploymentDescriptors throws misleading 
Exception if producer failed/cancelled
 Key: FLINK-5275
 URL: https://issues.apache.org/jira/browse/FLINK-5275
 Project: Flink
  Issue Type: Bug
  Components: Network
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


Reported by [~rmetzger]. While creating the input channel deployment 
descriptors for a job that does not allow lazy deployment (all streaming jobs), 
Robert got an {{ExecutionGraphException}} with the message {{Trying to eagerly 
schedule a task whose inputs are not ready}} in the 
{{InputChannelDeploymentDescriptor}}.

This was not reported as the root cause and is most likely due to a failed 
producer. For such cases, it makes sense to include the execution state of the 
producing task so that users see that this is related to the producer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5274) LocalInputChannel throws NPE if partition reader is released

2016-12-07 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5274:
--

 Summary: LocalInputChannel throws NPE if partition reader is 
released
 Key: FLINK-5274
 URL: https://issues.apache.org/jira/browse/FLINK-5274
 Project: Flink
  Issue Type: Bug
  Components: Network
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


Reported by [~rmetzger]:

{code}
java.lang.NullPointerException
   at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:185)
   at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:444)
   at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:152)
   at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:195)
   at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:67)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:638)
   at java.lang.Thread.run(Thread.java:745)
{code}

This is most likely caused by the result partition being released after the 
notifcations have happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5179) MetricRegistry life-cycle issues with HA

2016-12-06 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5179.
--
Resolution: Fixed

Fixed in e5f4e3d (master).

> MetricRegistry life-cycle issues with HA
> 
>
> Key: FLINK-5179
> URL: https://issues.apache.org/jira/browse/FLINK-5179
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.2.0
>
>
> The TaskManager's MetricRegistry is started when the TaskManager is created, 
> and shutdown in the TaskManager's postStop method.
> However, the registry is also shutdown within the TaskManager's 
> disassociateFromJobManager method; however it is not restarted when the 
> connection is re-established.
> Effectively this means that a TaskManager that ever reconnected to a 
> JobManager will not report any metrics, since the reporters are shutdown as 
> well. Metrics will neither be sent to the WebInterface anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5261) ScheduledDropwizardReporter does not properly clean up metrics

2016-12-06 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5261.
--
Resolution: Fixed

Fixed in f7f7b48 (master).

> ScheduledDropwizardReporter does not properly clean up metrics
> --
>
> Key: FLINK-5261
> URL: https://issues.apache.org/jira/browse/FLINK-5261
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.2.0
>
>
> The ScheduleDropWizardReporter does not have a separate branch for meters in 
> notifyOfRemovedMetric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5147) StreamingOperatorsITCase.testGroupedFoldOperation failed on Travis

2016-12-06 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725708#comment-15725708
 ] 

Ufuk Celebi commented on FLINK-5147:


Ocurred again https://api.travis-ci.org/jobs/181647649/log.txt?deansi=true

> StreamingOperatorsITCase.testGroupedFoldOperation failed on Travis
> --
>
> Key: FLINK-5147
> URL: https://issues.apache.org/jira/browse/FLINK-5147
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.3
> Environment: https://travis-ci.org/apache/flink/jobs/177675906
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> The test failed with the following exception:
> {code}
> testGroupedFoldOperation(org.apache.flink.test.streaming.api.StreamingOperatorsITCase)
>   Time elapsed: 0.573 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:905)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:848)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:848)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.delete(LocalFileSystem.java:187)
>   at 
> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:632)
>   at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:239)
>   at 
> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78)
>   at 
> org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:60)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:383)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:259)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:650)
>   at java.lang.Thread.run(Thread.java:745)
> testGroupedFoldOperation(org.apache.flink.test.streaming.api.StreamingOperatorsITCase)
>   Time elapsed: 0.573 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<4> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at 
> org.apache.flink.test.streaming.api.StreamingOperatorsITCase.after(StreamingOperatorsITCase.java:63)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator.

2016-12-05 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5264:
---
Description: 
Currently when trying to assign uid to an intermediate operator in a chain, the 
error message just says that it is not the right place, without any further 
information.

We could improve the message by telling explicitly to the user on which 
operator to assign the uid.

  was:
Currently when trying to assign uid to an intermediate operator in a chain, the 
error message just says that it is not the right place, without any further 
information.

We could improve the message by telling explicitly to the user on which 
operator to assign the uid.

To take it one step further, I would also suggest to always ask the user to set 
a uid whenever possible, even if he did not try to set it.


> Improve error message when assigning uid to intermediate operator.
> --
>
> Key: FLINK-5264
> URL: https://issues.apache.org/jira/browse/FLINK-5264
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Kostas Kloudas
>Assignee: Ufuk Celebi
>Priority: Blocker
> Fix For: 1.2.0
>
>
> Currently when trying to assign uid to an intermediate operator in a chain, 
> the error message just says that it is not the right place, without any 
> further information.
> We could improve the message by telling explicitly to the user on which 
> operator to assign the uid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator

2016-12-05 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5264:
---
Summary: Improve error message when assigning uid to intermediate operator  
(was: Improve error message when assigning uid to intermediate operator.)

> Improve error message when assigning uid to intermediate operator
> -
>
> Key: FLINK-5264
> URL: https://issues.apache.org/jira/browse/FLINK-5264
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Kostas Kloudas
>Assignee: Ufuk Celebi
>Priority: Blocker
> Fix For: 1.2.0
>
>
> Currently when trying to assign uid to an intermediate operator in a chain, 
> the error message just says that it is not the right place, without any 
> further information.
> We could improve the message by telling explicitly to the user on which 
> operator to assign the uid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-5264) Improve error message when assigning uid to intermediate operator.

2016-12-05 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5264:
--

Assignee: Ufuk Celebi

> Improve error message when assigning uid to intermediate operator.
> --
>
> Key: FLINK-5264
> URL: https://issues.apache.org/jira/browse/FLINK-5264
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Kostas Kloudas
>Assignee: Ufuk Celebi
>Priority: Blocker
> Fix For: 1.2.0
>
>
> Currently when trying to assign uid to an intermediate operator in a chain, 
> the error message just says that it is not the right place, without any 
> further information.
> We could improve the message by telling explicitly to the user on which 
> operator to assign the uid.
> To take it one step further, I would also suggest to always ask the user to 
> set a uid whenever possible, even if he did not try to set it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5263) Add docs about JVM sharing between tasks

2016-12-05 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5263:
--

 Summary: Add docs about JVM sharing between tasks
 Key: FLINK-5263
 URL: https://issues.apache.org/jira/browse/FLINK-5263
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi


I've seen the question about why/how JVMs are shared between tasks and jobs pop 
up on the mailing lists multiple times now. It makes sense to write down which 
config options affect this, how to effectively configure one JVM per task and 
job, and other related issues like PROs and CONs of Flink's approach to things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5248) SavepointITCase doesn't catch savepoint restore failure

2016-12-05 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5248.
--
Resolution: Fixed

> SavepointITCase doesn't catch savepoint restore failure
> ---
>
> Key: FLINK-5248
> URL: https://issues.apache.org/jira/browse/FLINK-5248
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.2.0, 1.1.4
>
>
> The savepoint IT case does not fail whem restoring from a savepoints fails on 
> the task side. It only checks that the deployment descriptors are as expected 
> when resubmitting. This is a pretty bad shortcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5248) SavepointITCase doesn't catch savepoint restore failure

2016-12-05 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5248.
--
Resolution: Fixed

Fixed in 4dee8fe (master), a5065e3 (release-1.1).

> SavepointITCase doesn't catch savepoint restore failure
> ---
>
> Key: FLINK-5248
> URL: https://issues.apache.org/jira/browse/FLINK-5248
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Critical
>
> The savepoint IT case does not fail whem restoring from a savepoints fails on 
> the task side. It only checks that the deployment descriptors are as expected 
> when resubmitting. This is a pretty bad shortcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5248) SavepointITCase doesn't catch savepoint restore failure

2016-12-05 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5248:
---
Fix Version/s: 1.1.4
   1.2.0

> SavepointITCase doesn't catch savepoint restore failure
> ---
>
> Key: FLINK-5248
> URL: https://issues.apache.org/jira/browse/FLINK-5248
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.2.0, 1.1.4
>
>
> The savepoint IT case does not fail whem restoring from a savepoints fails on 
> the task side. It only checks that the deployment descriptors are as expected 
> when resubmitting. This is a pretty bad shortcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   9   10   >