[jira] [Commented] (FLINK-5436) UDF state without CheckpointedRestoring can result in restarting loop
[ https://issues.apache.org/jira/browse/FLINK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833467#comment-15833467 ] Ufuk Celebi commented on FLINK-5436: [~stefanrichte...@gmail.com] and [~aljoscha] should we address this and FLINK-5437 for the upcoming release or not? > UDF state without CheckpointedRestoring can result in restarting loop > - > > Key: FLINK-5436 > URL: https://issues.apache.org/jira/browse/FLINK-5436 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Reporter: Ufuk Celebi >Priority: Minor > > When restoring a job with Checkpointed state and not implementing the new > CheckpointedRestoring interface, the job will be restarted over and over > again (given the respective restarting strategy). > Since this is not recoverable, we should immediately fail the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5530) race condition in AbstractRocksDBState#getSerializedValue
[ https://issues.apache.org/jira/browse/FLINK-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5530. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in d8222c1 (release-1.2), d16552d (master). > race condition in AbstractRocksDBState#getSerializedValue > - > > Key: FLINK-5530 > URL: https://issues.apache.org/jira/browse/FLINK-5530 > Project: Flink > Issue Type: Bug > Components: Queryable State >Affects Versions: 1.2.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Fix For: 1.2.0, 1.3.0 > > > AbstractRocksDBState#getSerializedValue() uses the same key serialisation > stream as the ordinary state access methods but is called in parallel during > state queries thus violating the assumption of only one thread accessing it. > This may lead to either wrong results in queries or corrupt data while > queries are executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5602) Migration with RocksDB job led to NPE for next checkpoint
Ufuk Celebi created FLINK-5602: -- Summary: Migration with RocksDB job led to NPE for next checkpoint Key: FLINK-5602 URL: https://issues.apache.org/jira/browse/FLINK-5602 Project: Flink Issue Type: Bug Reporter: Ufuk Celebi When migrating a job with RocksDB I got the following Exception when the next checkpoint was triggered. This only happened once and I could not reproduce it ever since. [~stefanrichte...@gmail.com] Maybe we can look over the code and check what could have failed here? I unfortunately don't have more available of the stack trace. I don't think that this will be very helpful will it? {code} at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:58) at org.apache.flink.runtime.state.KeyedBackendSerializationProxy$StateMetaInfo.(KeyedBackendSerializationProxy.java:126) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBSnapshotOperation.writeKVStateMetaData(RocksDBKeyedStateBackend.java:471) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:382) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.performOperation(RocksDBKeyedStateBackend.java:280) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.performOperation(RocksDBKeyedStateBackend.java:262) at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:37) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5601) Window operator does not checkpoint watermarks
Ufuk Celebi created FLINK-5601: -- Summary: Window operator does not checkpoint watermarks Key: FLINK-5601 URL: https://issues.apache.org/jira/browse/FLINK-5601 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi During release testing [~stefanrichte...@gmail.com] and I noticed that watermarks are not checkpointed in the window operator. This can lead to non determinism when restoring checkpoints. I was running an adjusted {{SessionWindowITCase}} via Kafka for testing migration and rescaling and ran into failures, because the data generator required determinisitic behaviour. What happened was that on restore it could happen that late elements were not dropped, because the watermarks needed to be re-established after restore first. [~aljoscha] Do you know whether there is a special reason for explicitly not checkpointing watermarks? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5600) Improve error message when triggering savepoint without specified directory
Ufuk Celebi created FLINK-5600: -- Summary: Improve error message when triggering savepoint without specified directory Key: FLINK-5600 URL: https://issues.apache.org/jira/browse/FLINK-5600 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Minor When triggering a savepoint w/o specifying a custom target directory or having configured a default directory, we get a quite long stack trace: {code} java.lang.Exception: Failed to trigger savepoint at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:801) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:118) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.IllegalStateException: No savepoint directory configured. You can either specify a directory when triggering this savepoint or configure a cluster-wide default via key 'state.savepoints.dir'. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:764) ... 22 more {code} This is already quite good, because the Exception says what can be done to work around this problem, but we can make it even better by handling this error in the client and printing a more explicit message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5599) State interface docs often refer to keyed state only
Ufuk Celebi created FLINK-5599: -- Summary: State interface docs often refer to keyed state only Key: FLINK-5599 URL: https://issues.apache.org/jira/browse/FLINK-5599 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Minor The JavaDocs of the {{State}} interface (and related classes) often mention keyed state only as the state interface was only exposed for keyed state until Flink 1.1. With the new {{CheckpointedFunction}} interface, this has changed and the docs should be adjusted accordingly. Would be nice to address this with 1.2.0 so that the JavaDocs are updated for users. [~stefanrichte...@gmail.com] or [~aljoscha] maybe you can have a look at this briefly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5528) tests: reduce the retry delay in QueryableStateITCase
[ https://issues.apache.org/jira/browse/FLINK-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5528. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 03a1f25 (release-1.2), 8d64263 (master). > tests: reduce the retry delay in QueryableStateITCase > - > > Key: FLINK-5528 > URL: https://issues.apache.org/jira/browse/FLINK-5528 > Project: Flink > Issue Type: Improvement > Components: Queryable State >Affects Versions: 1.2.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.2.0, 1.3.0 > > > The QueryableStateITCase uses a retry of 1 second, e.g. if a queried key does > not exist yet. This seems a bit too conservative as the job may not take that > long to deploy and especially since getKvStateWithRetries() recovers from > failures by retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5468) Restoring from a semi async rocksdb statebackend (1.1) to 1.2 fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/FLINK-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5468. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in d431373 (release-1.2), 988729e (master). > Restoring from a semi async rocksdb statebackend (1.1) to 1.2 fails with > ClassNotFoundException > --- > > Key: FLINK-5468 > URL: https://issues.apache.org/jira/browse/FLINK-5468 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Robert Metzger >Assignee: Stefan Richter > Fix For: 1.2.0, 1.3.0 > > > I think we should catch this exception and explain what's going on and how > users can resolve the issue. > {code} > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Job execution failed > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at com.dataartisans.eventwindow.Generator.main(Generator.java:60) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339) > at > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) > at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed > at > org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:328) > at > org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:382) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423) > ... 22 more > Caused by: java.io.IOException: java.lang.ClassNotFoundException: > org.apache.flink.contrib.streaming.state.RocksDBStateBackend$FinalSemiAsyncSnapshot > at > org.apache.flink.migration.runtime.checkpoint.savepoint.SavepointV0Serializer.deserialize(SavepointV0Serializer.java:162) > at > org.apache.flink.migration.runtime.checkpoint.savepoint.SavepointV0Serializer.deserialize(SavepointV0Serializer.java:70) > at > org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:138) > at > org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at >
[jira] [Closed] (FLINK-5561) DataInputDeserializer#available returns one too few
[ https://issues.apache.org/jira/browse/FLINK-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5561. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 108865d (release-1.2), 6c4644d (master). > DataInputDeserializer#available returns one too few > --- > > Key: FLINK-5561 > URL: https://issues.apache.org/jira/browse/FLINK-5561 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.2.0, 1.3.0 > > > DataInputDeserializer#available seems to assume that the position points to > the last read byte but instead it points to the next byte. Therefore, it > returns a value which is 1 smaller than the correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5515) fix unused kvState.getSerializedValue call in KvStateServerHandler
[ https://issues.apache.org/jira/browse/FLINK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5515. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 6e85106 (release-1.2), ddd7c36 (master). > fix unused kvState.getSerializedValue call in KvStateServerHandler > -- > > Key: FLINK-5515 > URL: https://issues.apache.org/jira/browse/FLINK-5515 > Project: Flink > Issue Type: Improvement >Reporter: Nico Kruber > Fix For: 1.2.0, 1.3.0 > > > This was added in 4809f5367b08a9734fc1bd4875be51a9f3bb65aa and is probably a > left-over from a merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5587) AsyncWaitOperatorTest timed out on Travis
Ufuk Celebi created FLINK-5587: -- Summary: AsyncWaitOperatorTest timed out on Travis Key: FLINK-5587 URL: https://issues.apache.org/jira/browse/FLINK-5587 Project: Flink Issue Type: Test Reporter: Ufuk Celebi The Maven watch dog script cancelled the build and printed a stack trace for {{AsyncWaitOperatorTest.testOperatorChainWithProcessingTime(AsyncWaitOperatorTest.java:379)}}. https://s3.amazonaws.com/archive.travis-ci.org/jobs/192441719/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5507) remove queryable list state sink
[ https://issues.apache.org/jira/browse/FLINK-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5507. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 1c8a48f (release-1.2), 63a6af3 (master). > remove queryable list state sink > > > Key: FLINK-5507 > URL: https://issues.apache.org/jira/browse/FLINK-5507 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.2.0, 1.3.0 > > > The queryable state "sink" using ListState > (".asQueryableState(, ListStateDescriptor)") stores all incoming data > forever and is never cleaned. Eventually, it will pile up too much memory and > is thus of limited use. > We should remove it from the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5482) QueryableStateClient does not recover from a failed lookup due to a non-running job
[ https://issues.apache.org/jira/browse/FLINK-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5482. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 1db8102 (release-1.2), 7ff7f43 (master). > QueryableStateClient does not recover from a failed lookup due to a > non-running job > --- > > Key: FLINK-5482 > URL: https://issues.apache.org/jira/browse/FLINK-5482 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.2.0, 1.3.0 > > > When the QueryableStateClient is used to issue a query but the job is not > running yet, its internal lookup result is cached with an > IllegalStateException that the job was not found. It does, however, never > recover from that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5556) BarrierBuffer resets bytes written on spiller roll over
[ https://issues.apache.org/jira/browse/FLINK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5556. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in a4853ba (release-1.2), e1b2cd0 (master). > BarrierBuffer resets bytes written on spiller roll over > --- > > Key: FLINK-5556 > URL: https://issues.apache.org/jira/browse/FLINK-5556 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > Fix For: 1.2.0, 1.3.0 > > > When rolling over a spilled sequence of buffers, the tracker bytes written > are reset to 0. They are reported to the checkpoint listener right after this > operation, which results in the reported buffered bytes always being 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5574) Add checkpoint statistics docs
[ https://issues.apache.org/jira/browse/FLINK-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5574. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 75f3681 (release-1.2), 6fb9ebc (master). > Add checkpoint statistics docs > -- > > Key: FLINK-5574 > URL: https://issues.apache.org/jira/browse/FLINK-5574 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > Fix For: 1.2.0, 1.3.0 > > > Add docs about the current state of checkpoint monitoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5574) Add checkpoint statistics docs
Ufuk Celebi created FLINK-5574: -- Summary: Add checkpoint statistics docs Key: FLINK-5574 URL: https://issues.apache.org/jira/browse/FLINK-5574 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Minor Add docs about the current state of checkpoint monitoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5560) Header in checkpoint stats summary misaligned
[ https://issues.apache.org/jira/browse/FLINK-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5560. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 772884e (release-1.2), 1be76f6 (master). > Header in checkpoint stats summary misaligned > - > > Key: FLINK-5560 > URL: https://issues.apache.org/jira/browse/FLINK-5560 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > Fix For: 1.2.0, 1.3.0 > > > The checkpoint summary stats table header line is misaligned. The first and > second head columns need to be swapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5377) Improve savepoint docs
[ https://issues.apache.org/jira/browse/FLINK-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5377. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 57a3955 (release-1.2), 6d8b3f5 (master). > Improve savepoint docs > -- > > Key: FLINK-5377 > URL: https://issues.apache.org/jira/browse/FLINK-5377 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.2.0, 1.1.3 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > The savepoint docs are very detailed and focus on the internals. They should > better convey what users have to take care of. > The following questions should be answered: > What happens if I add a new operator that requires state to my flow? > What happens if I delete an operator that has state to my flow? > What happens if I reorder stateful operators in my flow? > What happens if I add or delete or reorder operators that have no state in my > flow? > Should I apply .uid to all operators in my flow? > Should I apply .uid to only the operators that have state? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5415) ContinuousFileProcessingTest failed on travis
[ https://issues.apache.org/jira/browse/FLINK-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829692#comment-15829692 ] Ufuk Celebi commented on FLINK-5415: https://s3.amazonaws.com/archive.travis-ci.org/jobs/193115046/log.txt > ContinuousFileProcessingTest failed on travis > - > > Key: FLINK-5415 > URL: https://issues.apache.org/jira/browse/FLINK-5415 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Chesnay Schepler > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/189171123/log.txt > testFunctionRestore(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.162 sec <<< FAILURE! > java.lang.AssertionError: expected:<1483623669528> but > was:<-9223372036854775808> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFunctionRestore(ContinuousFileProcessingTest.java:761) > testProcessOnce(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.045 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessOnce(ContinuousFileProcessingTest.java:675) > testFileReadingOperatorWithIngestionTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime(ContinuousFileProcessingTest.java:150) > testSortingOnModTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.002 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testSortingOnModTime(ContinuousFileProcessingTest.java:596) > testFileReadingOperatorWithEventTime(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime(ContinuousFileProcessingTest.java:308) > testProcessContinuously(org.apache.flink.hdfstests.ContinuousFileProcessingTest) > Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.createFileAndFillWithData(ContinuousFileProcessingTest.java:958) > at > org.apache.flink.hdfstests.ContinuousFileProcessingTest.testProcessContinuously(ContinuousFileProcessingTest.java:771) > Results : > Failed tests: > > ContinuousFileProcessingTest.testFileReadingOperatorWithEventTime:308->createFileAndFillWithData:958 > null > > ContinuousFileProcessingTest.testFileReadingOperatorWithIngestionTime:150->createFileAndFillWithData:958 > null > ContinuousFileProcessingTest.testFunctionRestore:761 > expected:<1483623669528> but was:<-9223372036854775808> > >
[jira] [Created] (FLINK-5560) Header in checkpoint stats summary misaligned
Ufuk Celebi created FLINK-5560: -- Summary: Header in checkpoint stats summary misaligned Key: FLINK-5560 URL: https://issues.apache.org/jira/browse/FLINK-5560 Project: Flink Issue Type: Bug Components: Webfrontend Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Minor The checkpoint summary stats table header line is misaligned. The first and second head columns need to be swapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5556) BarrierBuffer resets bytes written on spiller roll over
Ufuk Celebi created FLINK-5556: -- Summary: BarrierBuffer resets bytes written on spiller roll over Key: FLINK-5556 URL: https://issues.apache.org/jira/browse/FLINK-5556 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Minor When rolling over a spilled sequence of buffers, the tracker bytes written are reset to 0. They are reported to the checkpoint listener right after this operation, which results in the reported buffered bytes always being 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5484) Kryo serialization changed between 1.1 and 1.2
[ https://issues.apache.org/jira/browse/FLINK-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5484. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in 931929b (release-1.1), 55483b7, a7644b1 (release-1.2), 8fddae8, 586f818 (master). > Kryo serialization changed between 1.1 and 1.2 > -- > > Key: FLINK-5484 > URL: https://issues.apache.org/jira/browse/FLINK-5484 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Reporter: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > I think the way that Kryo serializes data changed between 1.1 and 1.2. > I have a generic Object that is serialized as part of a 1.1 savepoint that I > cannot resume from with 1.2: > {code} > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Job execution failed. > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1486) > at com.dataartisans.DidKryoChange.main(DidKryoChange.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339) > at > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) > at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:900) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.IllegalStateException: Could not initialize keyed state > backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:286) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:199) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:649) > at >
[jira] [Commented] (FLINK-2608) Arrays.asList(..) does not work with CollectionInputFormat
[ https://issues.apache.org/jira/browse/FLINK-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828174#comment-15828174 ] Ufuk Celebi commented on FLINK-2608: Reverted in a7644b1 (release-1.2), 586f818 (master). > Arrays.asList(..) does not work with CollectionInputFormat > -- > > Key: FLINK-2608 > URL: https://issues.apache.org/jira/browse/FLINK-2608 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 0.9, 0.10.0 >Reporter: Maximilian Michels >Assignee: Alexander Chermenin >Priority: Minor > Fix For: 1.2.0 > > > When using Arrays.asList(..) as input for a CollectionInputFormat, the > serialization/deserialization fails when deploying the task. > See the following program: > {code:java} > public class WordCountExample { > public static void main(String[] args) throws Exception { > final ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > DataSet text = env.fromElements( > "Who's there?", > "I think I hear them. Stand, ho! Who's there?"); > // DOES NOT WORK > List elements = Arrays.asList(0, 0, 0); > // The following works: > //List elements = new ArrayList<>(new int[] {0,0,0}); > DataSet set = env.fromElements(new TestClass(elements)); > DataSet> wordCounts = text > .flatMap(new LineSplitter()) > .withBroadcastSet(set, "set") > .groupBy(0) > .sum(1); > wordCounts.print(); > } > public static class LineSplitter implements FlatMapFunction Tuple2 > { > @Override > public void flatMap(String line, Collector Integer>> out) { > for (String word : line.split(" ")) { > out.collect(new Tuple2 (word, 1)); > } > } > } > public static class TestClass implements Serializable { > private static final long serialVersionUID = -2932037991574118651L; > List integerList; > public TestClass(List integerList){ > this.integerList=integerList; > } > } > {code} > {noformat} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task > 'DataSource (at main(Test.java:32) > (org.apache.flink.api.java.io.CollectionInputFormat))': Deserializing the > InputFormat ([mytests.Test$TestClass@4d6025c5]) failed: unread block data > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:523) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:507) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:507) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at >
[jira] [Reopened] (FLINK-2608) Arrays.asList(..) does not work with CollectionInputFormat
[ https://issues.apache.org/jira/browse/FLINK-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reopened FLINK-2608: Have to reopen this because we can't update our Chill dependency without breaking savepoint compatability between 1.1 and 1.2. > Arrays.asList(..) does not work with CollectionInputFormat > -- > > Key: FLINK-2608 > URL: https://issues.apache.org/jira/browse/FLINK-2608 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 0.9, 0.10.0 >Reporter: Maximilian Michels >Assignee: Alexander Chermenin >Priority: Minor > Fix For: 1.2.0 > > > When using Arrays.asList(..) as input for a CollectionInputFormat, the > serialization/deserialization fails when deploying the task. > See the following program: > {code:java} > public class WordCountExample { > public static void main(String[] args) throws Exception { > final ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > DataSet text = env.fromElements( > "Who's there?", > "I think I hear them. Stand, ho! Who's there?"); > // DOES NOT WORK > List elements = Arrays.asList(0, 0, 0); > // The following works: > //List elements = new ArrayList<>(new int[] {0,0,0}); > DataSet set = env.fromElements(new TestClass(elements)); > DataSet> wordCounts = text > .flatMap(new LineSplitter()) > .withBroadcastSet(set, "set") > .groupBy(0) > .sum(1); > wordCounts.print(); > } > public static class LineSplitter implements FlatMapFunction Tuple2 > { > @Override > public void flatMap(String line, Collector Integer>> out) { > for (String word : line.split(" ")) { > out.collect(new Tuple2 (word, 1)); > } > } > } > public static class TestClass implements Serializable { > private static final long serialVersionUID = -2932037991574118651L; > List integerList; > public TestClass(List integerList){ > this.integerList=integerList; > } > } > {code} > {noformat} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task > 'DataSource (at main(Test.java:32) > (org.apache.flink.api.java.io.CollectionInputFormat))': Deserializing the > InputFormat ([mytests.Test$TestClass@4d6025c5]) failed: unread block data > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:523) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$4.apply(JobManager.scala:507) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:507) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at >
[jira] [Created] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient
Ufuk Celebi created FLINK-5510: -- Summary: Replace Scala Future with FlinkFuture in QueryableStateClient Key: FLINK-5510 URL: https://issues.apache.org/jira/browse/FLINK-5510 Project: Flink Issue Type: Improvement Components: Queryable State Reporter: Ufuk Celebi Priority: Minor The entry point for queryable state users is the {{QueryableStateClient}} which returns query results via Scala Futures. Since merging the initial version of QueryableState we have introduced the FlinkFuture wrapper type in order to not expose our Scala dependency via the API. Since APIs tend to stick around longer than expected, it might be worthwhile to port the exposed QueryableStateClient interface to use the FlinkFuture. Early users can still get the Scala Future via FlinkFuture#getScalaFuture(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5509) Replace QueryableStateClient keyHashCode argument
Ufuk Celebi created FLINK-5509: -- Summary: Replace QueryableStateClient keyHashCode argument Key: FLINK-5509 URL: https://issues.apache.org/jira/browse/FLINK-5509 Project: Flink Issue Type: Improvement Components: Queryable State Reporter: Ufuk Celebi Priority: Minor When going over the low level QueryableStateClient with [~NicoK] we noticed that the key hashCode argument can be confusing to users: {code} FuturegetKvState( JobID jobId, String name, int keyHashCode, byte[] serializedKeyAndNamespace) {code} The {{keyHashCode}} argument is the result of calling {{hashCode()}} on the key to look up. This is what is send to the JobManager in order to look up the location of the key. While pretty straight forward, it is repetitive and possibly confusing. As an alternative we suggest to make the method generic and simply call hashCode on the object ourselves. This way the user just provides the key object. Since there are some early users of the queryable state API already, we would suggest to rename the method in order to provoke a compilation error after upgrading to the actually released 1.2 version. (This would also work without renaming since the hashCode of Integer (what users currently provide) is the same number, but it would be confusing why it acutally works.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5507) remove queryable list state sink
[ https://issues.apache.org/jira/browse/FLINK-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823895#comment-15823895 ] Ufuk Celebi commented on FLINK-5507: +1 Would you like to do that and open a PR? > remove queryable list state sink > > > Key: FLINK-5507 > URL: https://issues.apache.org/jira/browse/FLINK-5507 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > The queryable state "sink" using ListState > (".asQueryableState(, ListStateDescriptor)") stores all incoming data > forever and is never cleaned. Eventually, it will pile up too much memory and > is thus of limited use. > We should remove it from the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5484) Kryo serialization changed between 1.1 and 1.2
Ufuk Celebi created FLINK-5484: -- Summary: Kryo serialization changed between 1.1 and 1.2 Key: FLINK-5484 URL: https://issues.apache.org/jira/browse/FLINK-5484 Project: Flink Issue Type: Bug Components: Type Serialization System Reporter: Ufuk Celebi I think the way that Kryo serializes data changed between 1.1 and 1.2. I have a generic Object that is serialized as part of a 1.1 savepoint that I cannot resume from with 1.2: {code} org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1486) at com.dataartisans.DidKryoChange.main(DidKryoChange.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:900) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:843) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:286) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:199) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:649) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:636) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:254) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Unable to find class: f at
[jira] [Closed] (FLINK-5467) Stateless chained tasks set legacy operator state
[ https://issues.apache.org/jira/browse/FLINK-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5467. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in af244aa (release-1.2) 51a3573 (master). > Stateless chained tasks set legacy operator state > - > > Key: FLINK-5467 > URL: https://issues.apache.org/jira/browse/FLINK-5467 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Reporter: Ufuk Celebi >Assignee: Stefan Richter > Fix For: 1.2.0, 1.3.0 > > > I discovered this while trying to rescale a job with a Kafka source with a > chained stateless operator. > Looking into it, it turns out that this fails, because the checkpointed state > contains legacy operator state for the chained operator although it is state > less. > /cc [~aljoscha] You mentioned that this might be a possible duplicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5477) Summary columns not aligned correctly
Ufuk Celebi created FLINK-5477: -- Summary: Summary columns not aligned correctly Key: FLINK-5477 URL: https://issues.apache.org/jira/browse/FLINK-5477 Project: Flink Issue Type: Bug Components: Webfrontend Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Minor [~rmetzger] reported that the summary stats columns in the reworked checkpoint stats are not correctly aligned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5444) Flink UI uses absolute URLs.
[ https://issues.apache.org/jira/browse/FLINK-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5444. -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Fixed in dff58df, 80f1517 (release-1.2), 42b53e6, d7e862a (master). > Flink UI uses absolute URLs. > > > Key: FLINK-5444 > URL: https://issues.apache.org/jira/browse/FLINK-5444 > Project: Flink > Issue Type: Bug >Reporter: Joerg Schad >Assignee: Joerg Schad > Fix For: 1.2.0, 1.3.0 > > > The Flink UI has a mixed use of absolute and relative links. See for example > [here](https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/web/index.html) > {code:|borderStyle=solid} > sizes="16x16"> > > {code} > When referencing the UI from another UI, e.g., the DC/OS UI relative links > are preffered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5466) Make production environment default in gulpfile
[ https://issues.apache.org/jira/browse/FLINK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5466. -- Resolution: Fixed Assignee: Ufuk Celebi Fix Version/s: 1.1.5 1.3.0 1.2.0 FIxed in 12cf5dc, 4ea52d6 (release-1.1), e55d426, 624f8ae (release-1.2), 408f6ea, e1181f6 (master). > Make production environment default in gulpfile > --- > > Key: FLINK-5466 > URL: https://issues.apache.org/jira/browse/FLINK-5466 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.2.0, 1.1.4 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.3.0, 1.1.5 > > > Currently the default environment set in our gulpfile is development, which > lead to very large created JS files. When building the web UI we apparently > forgot to set the environment to production (build via gulp production). > Since this is likely to occur again, we should make the default environment > production and make sure to use development manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5467) Stateless chained tasks set legacy operator state
Ufuk Celebi created FLINK-5467: -- Summary: Stateless chained tasks set legacy operator state Key: FLINK-5467 URL: https://issues.apache.org/jira/browse/FLINK-5467 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Reporter: Ufuk Celebi I discovered this while trying to rescale a job with a Kafka source with a chained stateless operator. Looking into it, it turns out that this fails, because the checkpointed state contains legacy operator state for the chained operator although it is state less. /cc [~aljoscha] You mentioned that this might be a possible duplicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5466) Make production environment default in gulpfile
Ufuk Celebi created FLINK-5466: -- Summary: Make production environment default in gulpfile Key: FLINK-5466 URL: https://issues.apache.org/jira/browse/FLINK-5466 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: 1.1.4, 1.2.0 Reporter: Ufuk Celebi Currently the default environment set in our gulpfile is development, which lead to very large created JS files. When building the web UI we apparently forgot to set the environment to production (build via gulp production). Since this is likely to occur again, we should make the default environment production and make sure to use development manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5448) Fix typo in StateAssignmentOperation Exception
Ufuk Celebi created FLINK-5448: -- Summary: Fix typo in StateAssignmentOperation Exception Key: FLINK-5448 URL: https://issues.apache.org/jira/browse/FLINK-5448 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Trivial {code} Cannot restore the latest checkpoint because the operator cbc357ccb763df2852fee8c4fc7d55f2 has non-partitioned state and its parallelism changed. The operatorcbc357ccb763df2852fee8c4fc7d55f2 has parallelism 2 whereas the correspondingstate object has a parallelism of 4 {code} White space is missing in some places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5440) Misleading error message when migrating and scaling down from savepoint
Ufuk Celebi created FLINK-5440: -- Summary: Misleading error message when migrating and scaling down from savepoint Key: FLINK-5440 URL: https://issues.apache.org/jira/browse/FLINK-5440 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Minor When resuming from an 1.1 savepoint with 1.2 and reducing the parallelism (and correctly setting the max parallelism), the error message says something about a missing operator which is misleading. Restoring from the same savepoint with the savepoint parallelism works as expected. Instead it should state that this kind of operation is not possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5439) Adjust max parallelism when migrating
Ufuk Celebi created FLINK-5439: -- Summary: Adjust max parallelism when migrating Key: FLINK-5439 URL: https://issues.apache.org/jira/browse/FLINK-5439 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Minor When migrating from v1 savepoints which don't have the notion of a max parallelism, the job needs to explicitly set the max parallelism to the parallelism of the savepoint. [~stefanrichte...@gmail.com] If this not trivially implemented, let's close this as won't fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5438) Typo in JobGraph generator Exception
Ufuk Celebi created FLINK-5438: -- Summary: Typo in JobGraph generator Exception Key: FLINK-5438 URL: https://issues.apache.org/jira/browse/FLINK-5438 Project: Flink Issue Type: Improvement Components: Client Reporter: Ufuk Celebi Priority: Trivial When trying to run a job with parallelism > max parallelism there is a typo in the error message: {code} Caused by: java.lang.IllegalStateException: The maximum parallelism (1) of the stream node Flat Map-3 is smaller than the parallelism (18). Increase the maximum parallelism or decrease the parallelism >>>ofthis<<< operator. at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobVertex(StreamingJobGraphGenerator.java:318) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5437) Make CheckpointedRestoring error message more detailed
Ufuk Celebi created FLINK-5437: -- Summary: Make CheckpointedRestoring error message more detailed Key: FLINK-5437 URL: https://issues.apache.org/jira/browse/FLINK-5437 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Minor When restoring Checkpointed state without implementing CheckpointedRestoring, the job fails with the following Exception: {code} java.lang.Exception: Found UDF state but operator is not instance of CheckpointedRestoring {code} I think we should make this error message more detailed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5436) UDF state without CheckpointedRestoring can result in restarting loop
Ufuk Celebi created FLINK-5436: -- Summary: UDF state without CheckpointedRestoring can result in restarting loop Key: FLINK-5436 URL: https://issues.apache.org/jira/browse/FLINK-5436 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Ufuk Celebi Priority: Minor When restoring a job with Checkpointed state and not implementing the new CheckpointedRestoring interface, the job will be restarted over and over again (given the respective restarting strategy). Since this is not recoverable, we should immediately fail the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5102) Connection establishment does not react to interrupt
[ https://issues.apache.org/jira/browse/FLINK-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5102. -- Resolution: Cannot Reproduce > Connection establishment does not react to interrupt > > > Key: FLINK-5102 > URL: https://issues.apache.org/jira/browse/FLINK-5102 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0 > > > Interrupting a connection establishment does not to react to interrupts. > {code} > Task - Task '... (60/120)' did not react to cancelling signal, but is stuck > in method: > java.lang.Object.$$YJP$$wait(Native Method) > java.lang.Object.wait(Object.java) > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:191) > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:131) > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:83) > org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:60) > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:118) > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:395) > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:414) > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:152) > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:195) > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:67) > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266) > org.apache.flink.runtime.taskmanager.Task.run(Task.java:638) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab
[ https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5306. -- Resolution: Implemented Fix Version/s: 1.3.0 1.2.0 Implemented in 579bc96, dec0d6b, 27df801 (master), 0d1f4bc, 1fd2d2e, 3f2e996 (release-1.2). > Display checkpointing configuration details in web UI "Configuration" tab > - > > Key: FLINK-5306 > URL: https://issues.apache.org/jira/browse/FLINK-5306 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.1.3 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > I wanted to check the checkpointing mode from the web UI, but its not > displayed there. > I think there are quite some job-wide settings we can show there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4410) Report more information about operator checkpoints
[ https://issues.apache.org/jira/browse/FLINK-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4410. -- Resolution: Implemented Fix Version/s: 1.3.0 All sub tasks have been implemented. > Report more information about operator checkpoints > -- > > Key: FLINK-4410 > URL: https://issues.apache.org/jira/browse/FLINK-4410 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing, Webfrontend >Affects Versions: 1.1.2 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > Checkpoint statistics contain the duration of a checkpoint, measured as from > the CheckpointCoordinator's start to the point when the acknowledge message > came. > We should additionally expose > - duration of the synchronous part of a checkpoint > - duration of the asynchronous part of a checkpoint > - number of bytes buffered during the stream alignment phase > - duration of the stream alignment phase > Note: In the case of using *at-least once* semantics, the latter two will > always be zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4698) Visualize additional checkpoint information
[ https://issues.apache.org/jira/browse/FLINK-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4698. -- Resolution: Implemented Fix Version/s: 1.3.0 Implemented in 27df801 (master), 3f2e996 (release-1.2). > Visualize additional checkpoint information > --- > > Key: FLINK-4698 > URL: https://issues.apache.org/jira/browse/FLINK-4698 > Project: Flink > Issue Type: Sub-task > Components: Webfrontend >Reporter: Stephan Ewen >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > Display the additional information gathered in the {{CheckpointStatsTracker}} > in the "Checkpoint" tab. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4697) Gather more detailed checkpoint stats in CheckpointStatsTracker
[ https://issues.apache.org/jira/browse/FLINK-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4697. -- Resolution: Implemented Fix Version/s: 1.3.0 Implemented in dec0d6b (master), 1fd2d2e (release-1.2). > Gather more detailed checkpoint stats in CheckpointStatsTracker > --- > > Key: FLINK-4697 > URL: https://issues.apache.org/jira/browse/FLINK-4697 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing >Reporter: Stephan Ewen >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.3.0 > > > The additional information attached to the {{AcknowledgeCheckpoint}} method > must be gathered in the {{CheckpointStatsTracker}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5066) add more efficient isEvent check to EventSerializer
[ https://issues.apache.org/jira/browse/FLINK-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5066. -- Resolution: Fixed Fix Version/s: 1.3.0 Implemented in 9457465 (master). > add more efficient isEvent check to EventSerializer > --- > > Key: FLINK-5066 > URL: https://issues.apache.org/jira/browse/FLINK-5066 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.3.0 > > > -LocalInputChannel#getNextBuffer de-serialises all incoming events on the > lookout for an EndOfPartitionEvent.- > Some buffer code de-serialises all incoming events on the lookout for an > EndOfPartitionEvent > (now applies to PartitionRequestQueue#isEndOfPartitionEvent()). > Instead, if EventSerializer offered a function to check for an event type > only without de-serialising the whole event, we could save some resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5059) only serialise events once in RecordWriter#broadcastEvent
[ https://issues.apache.org/jira/browse/FLINK-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5059. -- Resolution: Implemented Fix Version/s: 1.3.0 Fixed in 9cff8c9 (master). > only serialise events once in RecordWriter#broadcastEvent > - > > Key: FLINK-5059 > URL: https://issues.apache.org/jira/browse/FLINK-5059 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.3.0 > > > Currently, > org.apache.flink.runtime.io.network.api.writer.RecordWriter#broadcastEvent > serialises the event once per target channel. Instead, it could serialise the > event only once and use the serialised form for every channel and thus save > resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5428) Decrease restart delay in RecoveryITCase
Ufuk Celebi created FLINK-5428: -- Summary: Decrease restart delay in RecoveryITCase Key: FLINK-5428 URL: https://issues.apache.org/jira/browse/FLINK-5428 Project: Flink Issue Type: Improvement Components: Tests Reporter: Ufuk Celebi Priority: Minor RecoveryITCase takes around 30 seconds to run, because it uses the default restart delay of 10 seconds for each of its three tests. This can be sped up by decreasing the restart delay in the test. We would have to check whether the tests are sensitive to this. If not, it's simply a setting in the ExecutionConfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4131) Confusing error for out dated RequestPartitionState
[ https://issues.apache.org/jira/browse/FLINK-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4131. -- Resolution: Duplicate > Confusing error for out dated RequestPartitionState > --- > > Key: FLINK-4131 > URL: https://issues.apache.org/jira/browse/FLINK-4131 > Project: Flink > Issue Type: Improvement >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When a consumer requests a partition state for an old job or execution > attempt (e.g. failed job or attempt), the JobManager answers with a {{null}} > state, which fails the requesting task with the following cause: > {{IllegalStateException("Received unexpected partition state null for > partition request. This is a bug.")}}. > This is confusing to the user as one might think that this is the root > failure cause. > I propose to either ignore the null state at the Task or not respond on the > JobManager side if the job or execution attempt has been cleared (see > {{RequestPartitionState}} in {{JobManager.scala}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5180) Include blocked on bounded queue length in back pressure stats
[ https://issues.apache.org/jira/browse/FLINK-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5180. -- Resolution: Invalid > Include blocked on bounded queue length in back pressure stats > -- > > Key: FLINK-5180 > URL: https://issues.apache.org/jira/browse/FLINK-5180 > Project: Flink > Issue Type: Improvement >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > As a follow up to FLINK-5088, we need to adjust the back pressure stats > tracker to report back pressure when the task is blocked on the introduced > capacity limit. Currently, only blocking buffer requests are accounted for. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4651) Re-register processing time timers at the WindowOperator upon recovery.
[ https://issues.apache.org/jira/browse/FLINK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-4651: --- Fix Version/s: (was: 1.1.4) 1.1.5 > Re-register processing time timers at the WindowOperator upon recovery. > --- > > Key: FLINK-4651 > URL: https://issues.apache.org/jira/browse/FLINK-4651 > Project: Flink > Issue Type: Bug > Components: Windowing Operators >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > Labels: windows > Fix For: 1.2.0, 1.1.5 > > > Currently the {{WindowOperator}} checkpoints the processing time timers, but > upon recovery it does not re-registers them with the {{TimeServiceProvider}}. > To actually reprocess them it relies on another element that will come and > register a new timer for a future point in time. Although this is a realistic > assumption in long running jobs, we can remove this assumption by > re-registering the restored timers with the {{TimeServiceProvider}} in the > {{open()}} method of the {{WindowOperator}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation
[ https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5322: --- Fix Version/s: (was: 1.1.4) 1.1.5 > Clean up yarn configuration documentation > - > > Key: FLINK-5322 > URL: https://issues.apache.org/jira/browse/FLINK-5322 > Project: Flink > Issue Type: Improvement > Components: Documentation, YARN >Affects Versions: 1.2.0, 1.1.3 > Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3") >Reporter: Shannon Carey >Assignee: Chesnay Schepler >Priority: Trivial > Fix For: 1.2.0, 1.1.5 > > > The value I specified in flink-conf.yaml > {code} > yarn.taskmanager.env: > MY_ENV: test > {code} > is not available in {{System.getenv("MY_ENV")}} from the plan execution > (execution flow of main method) nor from within execution of a streaming > operator. > Interestingly, it does appear within the Flink JobManager Web UI under Job > Manager -> Configuration. > -- > The yarn section of the configuration page should be cleaned up a bit. The > "yarn.containers.vcores" parameter is listed twice, the example for > "yarn.application-master.env" is listed as a separate parameter and the > "yarn.taskmanager.env" description indirectly references another parameter > ("same as the above") which just isn't maintainable; instead it should be > described similarly as the application-master entry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5210) Failing performCheckpoint method causes task to fail
[ https://issues.apache.org/jira/browse/FLINK-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5210: --- Fix Version/s: (was: 1.1.4) 1.1.5 > Failing performCheckpoint method causes task to fail > - > > Key: FLINK-5210 > URL: https://issues.apache.org/jira/browse/FLINK-5210 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 1.2.0, 1.1.3 >Reporter: Till Rohrmann >Assignee: Till Rohrmann > Fix For: 1.2.0, 1.1.5 > > > A failure in {{StreamTask#performCheckpoint}} causes the {{Task}} to fail. > This should not be the case and instead the checkpoint files should be > cleaned up and the current checkpoint should be declined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5227) Add warning to include flink-table in job fat jars
[ https://issues.apache.org/jira/browse/FLINK-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5227: --- Fix Version/s: (was: 1.1.4) 1.1.5 > Add warning to include flink-table in job fat jars > -- > > Key: FLINK-5227 > URL: https://issues.apache.org/jira/browse/FLINK-5227 > Project: Flink > Issue Type: Improvement > Components: Documentation, Table API & SQL >Affects Versions: 1.2.0, 1.1.3 >Reporter: Fabian Hueske > Fix For: 1.2.0, 1.1.5 > > > {{flink-table}} depends on Apache Calcite which includes a JDBC Driver that > prevents classloaders from being collected. This is a known issue with > {{java.sqlDriverManager}} and can eventually cause OOME Permgen Taskmanager > failures. > The current workaround is to not include {{flink-table}} in the fat job JAR. > Instead the {{flink-table}} jar files should be added to the {{lib}} folder > of the TaskManagers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5302) Log failure cause at Execution
[ https://issues.apache.org/jira/browse/FLINK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5302: --- Fix Version/s: (was: 1.1.4) (was: 1.2.0) > Log failure cause at Execution > --- > > Key: FLINK-5302 > URL: https://issues.apache.org/jira/browse/FLINK-5302 > Project: Flink > Issue Type: Improvement >Reporter: Ufuk Celebi > > It can be helpful to log the failure cause that made an {{Execution}} switch > to state {{FAILED}}. We currently only see a "root cause" logged on the > JobManager, which happens to be the first failure cause that makes it to > {{ExecutionGraph#fail()}}. This depends on relative timings of messages. For > debugging it can be helpful to have all causes available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5233) Upgrade Jackson version because of class loader leak
[ https://issues.apache.org/jira/browse/FLINK-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5233: --- Fix Version/s: (was: 1.1.4) > Upgrade Jackson version because of class loader leak > > > Key: FLINK-5233 > URL: https://issues.apache.org/jira/browse/FLINK-5233 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > > A user reported this issue on the ML: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-1-3-OOME-Permgen-td10379.html > I propose to upgrade to Jackson 2.7.8, as this version contains the fix for > the issue, but its not a major jackson upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5375) Fix watermark documentation
[ https://issues.apache.org/jira/browse/FLINK-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5375: --- Fix Version/s: (was: 1.1.4) 1.1.5 > Fix watermark documentation > --- > > Key: FLINK-5375 > URL: https://issues.apache.org/jira/browse/FLINK-5375 > Project: Flink > Issue Type: Bug > Components: Documentation, Project Website >Affects Versions: 1.2.0, 1.1.3, 1.3.0 >Reporter: Fabian Hueske >Priority: Critical > Fix For: 1.2.0, 1.3.0, 1.1.5 > > > The [documentation of > watermarks|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html#event-time-and-watermarks] > is not correct. It states > {quote} > A Watermark(t) declares that event time has reached time t in that stream, > meaning that all events with a timestamps t’ < t have occurred. > {quote} > whereas the JavaDocs which is aligned with implementation says > {quote} > A Watermark tells operators that receive it that no elements with a > timestamp older or equal to the watermark timestamp should arrive at the > operator. > {quote} > The documentation needs to be updated. Moreover, we need to carefully check > that the watermark semantics are correctly described in other pages of the > documentation and blog posts published on the Flink website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5377) Improve savepoint docs
Ufuk Celebi created FLINK-5377: -- Summary: Improve savepoint docs Key: FLINK-5377 URL: https://issues.apache.org/jira/browse/FLINK-5377 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.1.3, 1.2.0 Reporter: Ufuk Celebi Assignee: Ufuk Celebi The savepoint docs are very detailed and focus on the internals. They should better convey what users have to take care of. The following questions should be answered: What happens if I add a new operator that requires state to my flow? What happens if I delete an operator that has state to my flow? What happens if I reorder stateful operators in my flow? What happens if I add or delete or reorder operators that have no state in my flow? Should I apply .uid to all operators in my flow? Should I apply .uid to only the operators that have state? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5352) Restore RocksDB 1.1.3 memory behavior
[ https://issues.apache.org/jira/browse/FLINK-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5352. -- Resolution: Fixed Fixed in a8b415f, a874ad8 (release-1.1). > Restore RocksDB 1.1.3 memory behavior > - > > Key: FLINK-5352 > URL: https://issues.apache.org/jira/browse/FLINK-5352 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.1.4 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.1.4 > > > The current Release Candidates for 1.1.4 use an updated RocksDB version with > a changed memory footprint, which makes setups unstable that use a 1.1.3 > configuration. > We should change the "predefined options" in the RocksDB state backend to > reflect the 1.1.3 settings for the 1.1.4 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5326) IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available
[ https://issues.apache.org/jira/browse/FLINK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5326. -- Resolution: Fixed Fix Version/s: 1.1.4 1.2.0 Fixed in 04db15a, 9ed7752 (release-1.1) and d965d5a, fc62723 (master). > IllegalStateException: Bug in Netty consumer logic: reader queue got notified > by partition about available data, but none was available > > > Key: FLINK-5326 > URL: https://issues.apache.org/jira/browse/FLINK-5326 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.2.0, 1.1.4 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > Labels: qa > Fix For: 1.2.0, 1.1.4 > > > {code} > 2016-12-10 23:56:39,056 DEBUG > org.apache.flink.runtime.io.network.partition.ResultPartition - Source: > control events generator (1/40) (3360ced43a57fed83904f22e93281ce0): Releasing > ResultPartition > e585300594a68036b0983cefaf048e17@3360ced43a57fed83904f22e93281ce0 [PIPELINED, > 1 subpartitions, 0 pending references]. > 2016-12-10 23:56:39,056 INFO org.apache.flink.runtime.taskmanager.Task > - dynamic filter (1/40) (b1b7284e0b4a6ba08a16c50dcf13ff0d) > switched from RUNNING to FAILED. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Fatal error at remote task manager > 'permanent-qa-cluster-2wv1/10.240.0.27:45062'. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:229) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:164) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Bug in Netty consumer logic: > reader queue got notified by partition about available data, but none was > available. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:177) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:111) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at >
[jira] [Commented] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails
[ https://issues.apache.org/jira/browse/FLINK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745056#comment-15745056 ] Ufuk Celebi commented on FLINK-5328: The attached logs also show this: {code} 12:00:31,315 ERROR org.apache.flink.runtime.iterative.task.AbstractIterativeTask - Error while shutting down an iterative operator. java.lang.NullPointerException at java.util.ArrayList.addAll(ArrayList.java:559) at org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.close(ReOpenableMutableHashTable.java:184) at org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashJoinIterator.close(NonReusingBuildSecondHashJoinIterator.java:103) at org.apache.flink.runtime.operators.AbstractCachedBuildSideJoinDriver.teardown(AbstractCachedBuildSideJoinDriver.java:213) at org.apache.flink.runtime.iterative.task.AbstractIterativeTask.closeLocalStrategiesAndCaches(AbstractIterativeTask.java:164) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:359) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) at java.lang.Thread.run(Thread.java:745) {code} > ConnectedComponentsITCase testJobWithoutObjectReuse fails > - > > Key: FLINK-5328 > URL: https://issues.apache.org/jira/browse/FLINK-5328 > Project: Flink > Issue Type: Test >Reporter: Ufuk Celebi > Labels: test-stability > Attachments: connectedComponentsFailure.txt > > > I've seen this fail a couple of times now: > ConnectedComponentsITCase#testJobWithoutObjectReuse. > {code} > Job execution failed. > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.io.IOException: Stream Closed > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:272) > at > org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72) > at > org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59) > at > org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662) > at > org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556) > at > org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522) > at > org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78) > at > org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) > at java.lang.Thread.run(Thread.java:745) > {code} > Complete logs are attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails
[ https://issues.apache.org/jira/browse/FLINK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5328: --- Attachment: connectedComponentsFailure.txt > ConnectedComponentsITCase testJobWithoutObjectReuse fails > - > > Key: FLINK-5328 > URL: https://issues.apache.org/jira/browse/FLINK-5328 > Project: Flink > Issue Type: Test >Reporter: Ufuk Celebi > Labels: test-stability > Attachments: connectedComponentsFailure.txt > > > I've seen this fail a couple of times now: > ConnectedComponentsITCase#testJobWithoutObjectReuse. > {code} > Job execution failed. > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.io.IOException: Stream Closed > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:272) > at > org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72) > at > org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59) > at > org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662) > at > org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556) > at > org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522) > at > org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78) > at > org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) > at java.lang.Thread.run(Thread.java:745) > {code} > Complete logs are attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails
Ufuk Celebi created FLINK-5328: -- Summary: ConnectedComponentsITCase testJobWithoutObjectReuse fails Key: FLINK-5328 URL: https://issues.apache.org/jira/browse/FLINK-5328 Project: Flink Issue Type: Test Reporter: Ufuk Celebi I've seen this fail a couple of times now: ConnectedComponentsITCase#testJobWithoutObjectReuse. {code} Job execution failed. org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.io.IOException: Stream Closed at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72) at org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59) at org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662) at org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556) at org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522) at org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) at java.lang.Thread.run(Thread.java:745) {code} Complete logs are attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5326) IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available
[ https://issues.apache.org/jira/browse/FLINK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5326: --- Affects Version/s: 1.2.0 > IllegalStateException: Bug in Netty consumer logic: reader queue got notified > by partition about available data, but none was available > > > Key: FLINK-5326 > URL: https://issues.apache.org/jira/browse/FLINK-5326 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.2.0, 1.1.4 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > Labels: qa > > {code} > 2016-12-10 23:56:39,056 DEBUG > org.apache.flink.runtime.io.network.partition.ResultPartition - Source: > control events generator (1/40) (3360ced43a57fed83904f22e93281ce0): Releasing > ResultPartition > e585300594a68036b0983cefaf048e17@3360ced43a57fed83904f22e93281ce0 [PIPELINED, > 1 subpartitions, 0 pending references]. > 2016-12-10 23:56:39,056 INFO org.apache.flink.runtime.taskmanager.Task > - dynamic filter (1/40) (b1b7284e0b4a6ba08a16c50dcf13ff0d) > switched from RUNNING to FAILED. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Fatal error at remote task manager > 'permanent-qa-cluster-2wv1/10.240.0.27:45062'. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:229) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:164) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Bug in Netty consumer logic: > reader queue got notified by partition about available data, but none was > available. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:177) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:111) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at >
[jira] [Commented] (FLINK-5326) IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available
[ https://issues.apache.org/jira/browse/FLINK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744744#comment-15744744 ] Ufuk Celebi commented on FLINK-5326: No, 1.2.0 too. > IllegalStateException: Bug in Netty consumer logic: reader queue got notified > by partition about available data, but none was available > > > Key: FLINK-5326 > URL: https://issues.apache.org/jira/browse/FLINK-5326 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.2.0, 1.1.4 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > Labels: qa > > {code} > 2016-12-10 23:56:39,056 DEBUG > org.apache.flink.runtime.io.network.partition.ResultPartition - Source: > control events generator (1/40) (3360ced43a57fed83904f22e93281ce0): Releasing > ResultPartition > e585300594a68036b0983cefaf048e17@3360ced43a57fed83904f22e93281ce0 [PIPELINED, > 1 subpartitions, 0 pending references]. > 2016-12-10 23:56:39,056 INFO org.apache.flink.runtime.taskmanager.Task > - dynamic filter (1/40) (b1b7284e0b4a6ba08a16c50dcf13ff0d) > switched from RUNNING to FAILED. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Fatal error at remote task manager > 'permanent-qa-cluster-2wv1/10.240.0.27:45062'. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:229) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:164) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Bug in Netty consumer logic: > reader queue got notified by partition about available data, but none was > available. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:177) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:111) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at > io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108) > at > io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:294) > at >
[jira] [Closed] (FLINK-5007) Retain externalized checkpoint on suspension
[ https://issues.apache.org/jira/browse/FLINK-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5007. -- Resolution: Fixed Fixed in 38ab716 (master). > Retain externalized checkpoint on suspension > > > Key: FLINK-5007 > URL: https://issues.apache.org/jira/browse/FLINK-5007 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0 > > > Externalized checkpoints are cleaned up when the job is suspended. > Suspensions happen on graceful shut down (non-HA) or loss of leadership (HA). > In case of HA, the checkpoint store does not clean up any checkpoints as they > might be recovered by a new leader. The only way to stop a HA job is to > actually cancel it. Therefore the configured clean up behaviour doesn't > matter. > In case of non-HA, suspensions happen because of graceful shut down (for > example stopping a YARN session). In this case I would treat the clean up > behaviour similar to cancelling the job. > {code} > ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION => delete on suspension > ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION => retain on suspension > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5088) Add option to limit subpartition queue length
[ https://issues.apache.org/jira/browse/FLINK-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5088. -- Resolution: Won't Fix > Add option to limit subpartition queue length > - > > Key: FLINK-5088 > URL: https://issues.apache.org/jira/browse/FLINK-5088 > Project: Flink > Issue Type: Improvement > Components: Network >Affects Versions: 1.1.3 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > Currently the sub partition queues are not bounded. Queued buffers are > consumed by the network event loop or local consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5114) PartitionState update with finished execution fails
[ https://issues.apache.org/jira/browse/FLINK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5114. -- Resolution: Fixed Fix Version/s: 1.1.4 1.2.0 Fixed in a078666 (master), 2b612f2 (release-1.1) > PartitionState update with finished execution fails > --- > > Key: FLINK-5114 > URL: https://issues.apache.org/jira/browse/FLINK-5114 > Project: Flink > Issue Type: Bug > Components: Network >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.1.4 > > > If a partition state request is triggered for a producer that finishes before > the request arrives, the execution is unregistered and the producer cannot be > found. In this case the PartitionState returns null and the job fails. > We need to check the producer location via the intermediate result partition > in this case. > See here: https://api.travis-ci.org/jobs/177668505/log.txt?deansi=true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator
[ https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5264: --- Priority: Major (was: Blocker) > Improve error message when assigning uid to intermediate operator > - > > Key: FLINK-5264 > URL: https://issues.apache.org/jira/browse/FLINK-5264 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Kostas Kloudas >Assignee: Ufuk Celebi > Fix For: 1.2.0 > > > Currently when trying to assign uid to an intermediate operator in a chain, > the error message just says that it is not the right place, without any > further information. > We could improve the message by telling explicitly to the user on which > operator to assign the uid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4797) Add Flink 1.1 savepoint backwards compatability
[ https://issues.apache.org/jira/browse/FLINK-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4797. -- Resolution: Duplicate > Add Flink 1.1 savepoint backwards compatability > --- > > Key: FLINK-4797 > URL: https://issues.apache.org/jira/browse/FLINK-4797 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > Make sure that we can resume from Flink 1.1 savepoints on the job manager > side. This means that we need to be able to read the savepoint header file on > the job manager and create a completed checkpoint from it. > This will not yet mean that we can resume from 1.1 savepoints as the > operator/task manager side compatability is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5041) Implement savepoint backwards compatibility 1.1 -> 1.2
[ https://issues.apache.org/jira/browse/FLINK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15741496#comment-15741496 ] Ufuk Celebi commented on FLINK-5041: This issue/sub tasks of this issue duplicate FLINK-4797. Should have been closed before creating these. > Implement savepoint backwards compatibility 1.1 -> 1.2 > -- > > Key: FLINK-5041 > URL: https://issues.apache.org/jira/browse/FLINK-5041 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Stefan Richter >Assignee: Stefan Richter > > This issue tracks the implementation of backwards compatibility between Flink > 1.1 and 1.2 releases. > This task subsumes: > - Converting old savepoints to new savepoints, including a conversion of > state handles to their new replacement. > - Converting keyed state from old backend implementations to their new > counterparts. > - Converting operator and function state for all changed operators. > - Ensure backwards compatibility of the hashes used to generate JobVertexIds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab
[ https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735100#comment-15735100 ] Ufuk Celebi commented on FLINK-5306: No, didn't open an issue yet, but I'll make this a sub task. > Display checkpointing configuration details in web UI "Configuration" tab > - > > Key: FLINK-5306 > URL: https://issues.apache.org/jira/browse/FLINK-5306 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.1.3 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > > I wanted to check the checkpointing mode from the web UI, but its not > displayed there. > I think there are quite some job-wide settings we can show there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab
[ https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735071#comment-15735071 ] Ufuk Celebi commented on FLINK-5306: I planned to make this part of the stats refactoring. Stay tuned... ;) > Display checkpointing configuration details in web UI "Configuration" tab > - > > Key: FLINK-5306 > URL: https://issues.apache.org/jira/browse/FLINK-5306 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.1.3 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > > I wanted to check the checkpointing mode from the web UI, but its not > displayed there. > I think there are quite some job-wide settings we can show there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-5306) Display checkpointing configuration details in web UI "Configuration" tab
[ https://issues.apache.org/jira/browse/FLINK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reassigned FLINK-5306: -- Assignee: Ufuk Celebi > Display checkpointing configuration details in web UI "Configuration" tab > - > > Key: FLINK-5306 > URL: https://issues.apache.org/jira/browse/FLINK-5306 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.1.3 >Reporter: Robert Metzger >Assignee: Ufuk Celebi > > I wanted to check the checkpointing mode from the web UI, but its not > displayed there. > I think there are quite some job-wide settings we can show there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5302) Log failure cause at Execution
Ufuk Celebi created FLINK-5302: -- Summary: Log failure cause at Execution Key: FLINK-5302 URL: https://issues.apache.org/jira/browse/FLINK-5302 Project: Flink Issue Type: Improvement Reporter: Ufuk Celebi Fix For: 1.2.0, 1.1.4 It can be helpful to log the failure cause that made an {{Execution}} switch to state {{FAILED}}. We currently only see a "root cause" logged on the JobManager, which happens to be the first failure cause that makes it to {{ExecutionGraph#fail()}}. This depends on relative timings of messages. For debugging it can be helpful to have all causes available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3902) Discarded FileSystem checkpoints are lingering around
[ https://issues.apache.org/jira/browse/FLINK-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-3902. -- Resolution: Not A Bug This is actually an artifact of the way we try to delete fiiles and how this interacts with HDFS (which logs this as an Exception). > Discarded FileSystem checkpoints are lingering around > - > > Key: FLINK-3902 > URL: https://issues.apache.org/jira/browse/FLINK-3902 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.2 >Reporter: Ufuk Celebi > > A user reported that checkpoints with {{FSStateBackend}} are not properly > cleaned up. > {code} > 2016-05-10 12:21:06,559 INFO BlockStateChange: BLOCK* addToInvalidates: > blk_1084791727_11053122 10.10.113.10:50010 > 2016-05-10 12:21:06,559 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from > 10.10.113.9:49233 Call#12337 Retry#0 > org.apache.hadoop.fs.PathIsNotEmptyDirectoryException: > `/flink/checkpoints_test/570d6e67d571c109daab468e5678402b/chk-62 is non > empty': Directory is not empty > at > org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:85) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3712) > {code} > {code} > 2016-05-10 12:20:22,636 [Checkpoint Timer] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering > checkpoint 62 @ 1462875622636 > 2016-05-10 12:20:32,507 [flink-akka.actor.default-dispatcher-240088] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed > checkpoint 62 (in 9843 ms) > 2016-05-10 12:20:52,637 [Checkpoint Timer] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering > checkpoint 63 @ 1462875652637 > 2016-05-10 12:21:06,563 [flink-akka.actor.default-dispatcher-240028] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed > checkpoint 63 (in 13909 ms) > 2016-05-10 12:21:22,636 [Checkpoint Timer] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering > checkpoint 64 @ 1462875682636 > {code} > Running the same program with the {{RocksDBBackend}} works as expected and > clears the old checkpoints properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5278) Improve Task and checkpoint logging
[ https://issues.apache.org/jira/browse/FLINK-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729286#comment-15729286 ] Ufuk Celebi commented on FLINK-5278: Merged in b046038 (release-1.1). Merge for master pending. > Improve Task and checkpoint logging > > > Key: FLINK-5278 > URL: https://issues.apache.org/jira/browse/FLINK-5278 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing, TaskManager >Affects Versions: 1.2.0, 1.1.3 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Minor > Fix For: 1.2.0, 1.1.4 > > > The logging of task and checkpoint logic could be improved to contain more > information relevant for debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5279) Improve error message when trying to access keyed state in non-keyed operator
Ufuk Celebi created FLINK-5279: -- Summary: Improve error message when trying to access keyed state in non-keyed operator Key: FLINK-5279 URL: https://issues.apache.org/jira/browse/FLINK-5279 Project: Flink Issue Type: Improvement Affects Versions: 1.1.3 Reporter: Ufuk Celebi When trying to access keyed state in a non-keyed operator, the error message is not very helpful. You get a trace like this: {code} java.lang.RuntimeException: Error while getting state ... Caused by: java.lang.RuntimeException: State key serializer has not been configured in the config. This operation cannot use partitioned state. {code} It will be helpful to users if this is more explicit to users, stating that the API can only be used on keyed streams, etc. If this applies to the current master as well, we should fix it there, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5274) LocalInputChannel throws NPE if partition reader is released
[ https://issues.apache.org/jira/browse/FLINK-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5274. -- Resolution: Fixed Fix Version/s: 1.1.4 1.2.0 555a687 (master), 1b472d2 (release-1.1) > LocalInputChannel throws NPE if partition reader is released > > > Key: FLINK-5274 > URL: https://issues.apache.org/jira/browse/FLINK-5274 > Project: Flink > Issue Type: Bug > Components: Network >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.1.4 > > > Reported by [~rmetzger]: > {code} > java.lang.NullPointerException >at > org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:185) >at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:444) >at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:152) >at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:195) >at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:67) >at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267) >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:638) >at java.lang.Thread.run(Thread.java:745) > {code} > This is most likely caused by the result partition being released after the > notifcations have happened. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5275) InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled
[ https://issues.apache.org/jira/browse/FLINK-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5275. -- Resolution: Fixed Fix Version/s: 1.1.4 1.2.0 4410c04 (master), 4526005 (release-1.1) > InputChanelDeploymentDescriptors throws misleading Exception if producer > failed/cancelled > - > > Key: FLINK-5275 > URL: https://issues.apache.org/jira/browse/FLINK-5275 > Project: Flink > Issue Type: Bug > Components: Network >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0, 1.1.4 > > > Reported by [~rmetzger]. While creating the input channel deployment > descriptors for a job that does not allow lazy deployment (all streaming > jobs), Robert got an {{ExecutionGraphException}} with the message {{Trying to > eagerly schedule a task whose inputs are not ready}} in the > {{InputChannelDeploymentDescriptor}}. > This was not reported as the root cause and is most likely due to a failed > producer. For such cases, it makes sense to include the execution state of > the producing task so that users see that this is related to the producer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts
[ https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5276. -- Resolution: Fixed Fixed in 75b48e (release-1.1) > ExecutionVertex archiving can throw NPE with many previous attempts > --- > > Key: FLINK-5276 > URL: https://issues.apache.org/jira/browse/FLINK-5276 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.1.4 > > > I saw a NPE while archiving a ExecutionVertex: > {code} > execution graph > org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. > java.lang.NullPointerException: null > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) > ... > {code} > I think this is due to the newly introduced {{EvictingBoundedList}} which > returns a default element ({{null}})) for evicted elements when iterating > over it. > This affects the backport to release-1.1 only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts
[ https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reassigned FLINK-5276: -- Assignee: Ufuk Celebi (was: Stefan Richter) > ExecutionVertex archiving can throw NPE with many previous attempts > --- > > Key: FLINK-5276 > URL: https://issues.apache.org/jira/browse/FLINK-5276 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.1.4 > > > I saw a NPE while archiving a ExecutionVertex: > {code} > execution graph > org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. > java.lang.NullPointerException: null > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) > ... > {code} > I think this is due to the newly introduced {{EvictingBoundedList}} which > returns a default element ({{null}})) for evicted elements when iterating > over it. > This affects the backport to release-1.1 only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts
[ https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5276: --- Fix Version/s: (was: 1.2.0) > ExecutionVertex archiving can throw NPE with many previous attempts > --- > > Key: FLINK-5276 > URL: https://issues.apache.org/jira/browse/FLINK-5276 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Ufuk Celebi >Assignee: Stefan Richter > Fix For: 1.1.4 > > > I saw a NPE while archiving a ExecutionVertex: > {code} > execution graph > org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. > java.lang.NullPointerException: null > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) > ... > {code} > I think this is due to the newly introduced {{EvictingBoundedList}} which > returns a default element ({{null}})) for evicted elements when iterating > over it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts
[ https://issues.apache.org/jira/browse/FLINK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5276: --- Description: I saw a NPE while archiving a ExecutionVertex: {code} execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. java.lang.NullPointerException: null at org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) ... {code} I think this is due to the newly introduced {{EvictingBoundedList}} which returns a default element ({{null}})) for evicted elements when iterating over it. This affects the backport to release-1.1 only. was: I saw a NPE while archiving a ExecutionVertex: {code} execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. java.lang.NullPointerException: null at org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) ... {code} I think this is due to the newly introduced {{EvictingBoundedList}} which returns a default element ({{null}})) for evicted elements when iterating over it. > ExecutionVertex archiving can throw NPE with many previous attempts > --- > > Key: FLINK-5276 > URL: https://issues.apache.org/jira/browse/FLINK-5276 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Ufuk Celebi >Assignee: Stefan Richter > Fix For: 1.1.4 > > > I saw a NPE while archiving a ExecutionVertex: > {code} > execution graph > org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. > java.lang.NullPointerException: null > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) > ... > {code} > I think this is due to the newly introduced {{EvictingBoundedList}} which > returns a default element ({{null}})) for evicted elements when iterating > over it. > This affects the backport to release-1.1 only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5041) Implement savepoint backwards compatibility 1.1 -> 1.2
[ https://issues.apache.org/jira/browse/FLINK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728721#comment-15728721 ] Ufuk Celebi commented on FLINK-5041: Subtasks FLINK-5042 and FLINK-5043 implemented in in af3bf83 (master). > Implement savepoint backwards compatibility 1.1 -> 1.2 > -- > > Key: FLINK-5041 > URL: https://issues.apache.org/jira/browse/FLINK-5041 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Stefan Richter >Assignee: Stefan Richter > > This issue tracks the implementation of backwards compatibility between Flink > 1.1 and 1.2 releases. > This task subsumes: > - Converting old savepoints to new savepoints, including a conversion of > state handles to their new replacement. > - Converting keyed state from old backend implementations to their new > counterparts. > - Converting operator and function state for all changed operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5043) Converting keyed state from Flink 1.1 backend implementations to their new counterparts in 1.2
[ https://issues.apache.org/jira/browse/FLINK-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5043. -- Resolution: Implemented Fix Version/s: 1.2.0 Implemented in af3bf83 (master). > Converting keyed state from Flink 1.1 backend implementations to their new > counterparts in 1.2 > -- > > Key: FLINK-5043 > URL: https://issues.apache.org/jira/browse/FLINK-5043 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Stefan Richter >Assignee: Stefan Richter > Fix For: 1.2.0 > > > Keyed state backends became keygroup aware in Flink 1.2 and their hierarchy > as a whole changed significantly. We need to implement a conversion so that > old snapshots can be restored into new backends. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5042) Convert old savepoints to new savepoints
[ https://issues.apache.org/jira/browse/FLINK-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5042. -- Resolution: Implemented Fix Version/s: 1.2.0 Implemented in af3bf83 (master). > Convert old savepoints to new savepoints > > > Key: FLINK-5042 > URL: https://issues.apache.org/jira/browse/FLINK-5042 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Stefan Richter >Assignee: Stefan Richter > Fix For: 1.2.0 > > > The format of savepoints and the hierarchy of state handles changed a lot > between Flink 1.1 and 1.2. For backwards compatibility, we need to convert > old to new savepoints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5276) ExecutionVertex archiving can throw NPE with many previous attempts
Ufuk Celebi created FLINK-5276: -- Summary: ExecutionVertex archiving can throw NPE with many previous attempts Key: FLINK-5276 URL: https://issues.apache.org/jira/browse/FLINK-5276 Project: Flink Issue Type: Bug Components: JobManager Reporter: Ufuk Celebi Assignee: Stefan Richter Fix For: 1.2.0, 1.1.4 I saw a NPE while archiving a ExecutionVertex: {code} execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@c4e0722 for archiving. java.lang.NullPointerException: null at org.apache.flink.runtime.executiongraph.ExecutionVertex.prepareForArchiving(ExecutionVertex.java:583) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:439) at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:1042) ... {code} I think this is due to the newly introduced {{EvictingBoundedList}} which returns a default element ({{null}})) for evicted elements when iterating over it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5275) InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled
Ufuk Celebi created FLINK-5275: -- Summary: InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled Key: FLINK-5275 URL: https://issues.apache.org/jira/browse/FLINK-5275 Project: Flink Issue Type: Bug Components: Network Reporter: Ufuk Celebi Assignee: Ufuk Celebi Reported by [~rmetzger]. While creating the input channel deployment descriptors for a job that does not allow lazy deployment (all streaming jobs), Robert got an {{ExecutionGraphException}} with the message {{Trying to eagerly schedule a task whose inputs are not ready}} in the {{InputChannelDeploymentDescriptor}}. This was not reported as the root cause and is most likely due to a failed producer. For such cases, it makes sense to include the execution state of the producing task so that users see that this is related to the producer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5274) LocalInputChannel throws NPE if partition reader is released
Ufuk Celebi created FLINK-5274: -- Summary: LocalInputChannel throws NPE if partition reader is released Key: FLINK-5274 URL: https://issues.apache.org/jira/browse/FLINK-5274 Project: Flink Issue Type: Bug Components: Network Reporter: Ufuk Celebi Assignee: Ufuk Celebi Reported by [~rmetzger]: {code} java.lang.NullPointerException at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:185) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:444) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:152) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:195) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:67) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:638) at java.lang.Thread.run(Thread.java:745) {code} This is most likely caused by the result partition being released after the notifcations have happened. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5179) MetricRegistry life-cycle issues with HA
[ https://issues.apache.org/jira/browse/FLINK-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5179. -- Resolution: Fixed Fixed in e5f4e3d (master). > MetricRegistry life-cycle issues with HA > > > Key: FLINK-5179 > URL: https://issues.apache.org/jira/browse/FLINK-5179 > Project: Flink > Issue Type: Bug > Components: Metrics >Affects Versions: 1.2.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.2.0 > > > The TaskManager's MetricRegistry is started when the TaskManager is created, > and shutdown in the TaskManager's postStop method. > However, the registry is also shutdown within the TaskManager's > disassociateFromJobManager method; however it is not restarted when the > connection is re-established. > Effectively this means that a TaskManager that ever reconnected to a > JobManager will not report any metrics, since the reporters are shutdown as > well. Metrics will neither be sent to the WebInterface anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5261) ScheduledDropwizardReporter does not properly clean up metrics
[ https://issues.apache.org/jira/browse/FLINK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5261. -- Resolution: Fixed Fixed in f7f7b48 (master). > ScheduledDropwizardReporter does not properly clean up metrics > -- > > Key: FLINK-5261 > URL: https://issues.apache.org/jira/browse/FLINK-5261 > Project: Flink > Issue Type: Bug > Components: Metrics >Affects Versions: 1.2.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.2.0 > > > The ScheduleDropWizardReporter does not have a separate branch for meters in > notifyOfRemovedMetric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5147) StreamingOperatorsITCase.testGroupedFoldOperation failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725708#comment-15725708 ] Ufuk Celebi commented on FLINK-5147: Ocurred again https://api.travis-ci.org/jobs/181647649/log.txt?deansi=true > StreamingOperatorsITCase.testGroupedFoldOperation failed on Travis > -- > > Key: FLINK-5147 > URL: https://issues.apache.org/jira/browse/FLINK-5147 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.3 > Environment: https://travis-ci.org/apache/flink/jobs/177675906 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler > > The test failed with the following exception: > {code} > testGroupedFoldOperation(org.apache.flink.test.streaming.api.StreamingOperatorsITCase) > Time elapsed: 0.573 sec <<< ERROR! > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:905) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:848) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:848) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.NullPointerException: null > at > org.apache.flink.core.fs.local.LocalFileSystem.delete(LocalFileSystem.java:187) > at > org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:632) > at > org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:239) > at > org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78) > at > org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:60) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:154) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:383) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:259) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:650) > at java.lang.Thread.run(Thread.java:745) > testGroupedFoldOperation(org.apache.flink.test.streaming.api.StreamingOperatorsITCase) > Time elapsed: 0.573 sec <<< FAILURE! > java.lang.AssertionError: Different number of lines in expected and obtained > result. expected:<4> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316) > at > org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302) > at > org.apache.flink.test.streaming.api.StreamingOperatorsITCase.after(StreamingOperatorsITCase.java:63) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator.
[ https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5264: --- Description: Currently when trying to assign uid to an intermediate operator in a chain, the error message just says that it is not the right place, without any further information. We could improve the message by telling explicitly to the user on which operator to assign the uid. was: Currently when trying to assign uid to an intermediate operator in a chain, the error message just says that it is not the right place, without any further information. We could improve the message by telling explicitly to the user on which operator to assign the uid. To take it one step further, I would also suggest to always ask the user to set a uid whenever possible, even if he did not try to set it. > Improve error message when assigning uid to intermediate operator. > -- > > Key: FLINK-5264 > URL: https://issues.apache.org/jira/browse/FLINK-5264 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Kostas Kloudas >Assignee: Ufuk Celebi >Priority: Blocker > Fix For: 1.2.0 > > > Currently when trying to assign uid to an intermediate operator in a chain, > the error message just says that it is not the right place, without any > further information. > We could improve the message by telling explicitly to the user on which > operator to assign the uid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator
[ https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5264: --- Summary: Improve error message when assigning uid to intermediate operator (was: Improve error message when assigning uid to intermediate operator.) > Improve error message when assigning uid to intermediate operator > - > > Key: FLINK-5264 > URL: https://issues.apache.org/jira/browse/FLINK-5264 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Kostas Kloudas >Assignee: Ufuk Celebi >Priority: Blocker > Fix For: 1.2.0 > > > Currently when trying to assign uid to an intermediate operator in a chain, > the error message just says that it is not the right place, without any > further information. > We could improve the message by telling explicitly to the user on which > operator to assign the uid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-5264) Improve error message when assigning uid to intermediate operator.
[ https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reassigned FLINK-5264: -- Assignee: Ufuk Celebi > Improve error message when assigning uid to intermediate operator. > -- > > Key: FLINK-5264 > URL: https://issues.apache.org/jira/browse/FLINK-5264 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Kostas Kloudas >Assignee: Ufuk Celebi >Priority: Blocker > Fix For: 1.2.0 > > > Currently when trying to assign uid to an intermediate operator in a chain, > the error message just says that it is not the right place, without any > further information. > We could improve the message by telling explicitly to the user on which > operator to assign the uid. > To take it one step further, I would also suggest to always ask the user to > set a uid whenever possible, even if he did not try to set it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5263) Add docs about JVM sharing between tasks
Ufuk Celebi created FLINK-5263: -- Summary: Add docs about JVM sharing between tasks Key: FLINK-5263 URL: https://issues.apache.org/jira/browse/FLINK-5263 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ufuk Celebi I've seen the question about why/how JVMs are shared between tasks and jobs pop up on the mailing lists multiple times now. It makes sense to write down which config options affect this, how to effectively configure one JVM per task and job, and other related issues like PROs and CONs of Flink's approach to things. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5248) SavepointITCase doesn't catch savepoint restore failure
[ https://issues.apache.org/jira/browse/FLINK-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5248. -- Resolution: Fixed > SavepointITCase doesn't catch savepoint restore failure > --- > > Key: FLINK-5248 > URL: https://issues.apache.org/jira/browse/FLINK-5248 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.3 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Critical > Fix For: 1.2.0, 1.1.4 > > > The savepoint IT case does not fail whem restoring from a savepoints fails on > the task side. It only checks that the deployment descriptors are as expected > when resubmitting. This is a pretty bad shortcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5248) SavepointITCase doesn't catch savepoint restore failure
[ https://issues.apache.org/jira/browse/FLINK-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-5248. -- Resolution: Fixed Fixed in 4dee8fe (master), a5065e3 (release-1.1). > SavepointITCase doesn't catch savepoint restore failure > --- > > Key: FLINK-5248 > URL: https://issues.apache.org/jira/browse/FLINK-5248 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.3 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Critical > > The savepoint IT case does not fail whem restoring from a savepoints fails on > the task side. It only checks that the deployment descriptors are as expected > when resubmitting. This is a pretty bad shortcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5248) SavepointITCase doesn't catch savepoint restore failure
[ https://issues.apache.org/jira/browse/FLINK-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-5248: --- Fix Version/s: 1.1.4 1.2.0 > SavepointITCase doesn't catch savepoint restore failure > --- > > Key: FLINK-5248 > URL: https://issues.apache.org/jira/browse/FLINK-5248 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.3 >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Critical > Fix For: 1.2.0, 1.1.4 > > > The savepoint IT case does not fail whem restoring from a savepoints fails on > the task side. It only checks that the deployment descriptors are as expected > when resubmitting. This is a pretty bad shortcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)