[jira] [Created] (FLINK-10033) Let Task release reference to Invokable on shutdown
Stephan Ewen created FLINK-10033: Summary: Let Task release reference to Invokable on shutdown Key: FLINK-10033 URL: https://issues.apache.org/jira/browse/FLINK-10033 Project: Flink Issue Type: Bug Components: TaskManager Affects Versions: 1.5.2 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.3, 1.6.0 References to Task objects may under some conditions linger longer than for the lifetime of the task. For example, in case of local network channels, the receiving task may have a reference to the object of the task that produced the data. To prevent against memory leaks, the Task needs to release all references to its AbstractInvokable when it shuts down or cancels. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9865) flink-hadoop-compatibility should assume Hadoop as provided
Stephan Ewen created FLINK-9865: --- Summary: flink-hadoop-compatibility should assume Hadoop as provided Key: FLINK-9865 URL: https://issues.apache.org/jira/browse/FLINK-9865 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 1.5.1, 1.5.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.6.0 The {{flink-hadoop-compatibility}} project as a *compile* scope dependency on Hadoop ({{flink-hadoop-shaded}}). Because of that, the hadoop dependencies are pulled into the user application. Like in other Hadoop-dependent modules, we should assume that Hadoop is provided in the framework classpath already. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9809) Support setting CoLocation constraints on the DataStream Transformations
Stephan Ewen created FLINK-9809: --- Summary: Support setting CoLocation constraints on the DataStream Transformations Key: FLINK-9809 URL: https://issues.apache.org/jira/browse/FLINK-9809 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.6.0 Flink supports co location constraints for operator placement during scheduling. This is used internally for iterations, for example, but is not exposed to users. I propose to add a way for expert users to set these constraints. As a first step, I would add them to the {{StreamTransformation}}, which is not part of the public user-facing classes, but a more internal class in the DataStream API. That way we make this initially a hidden feature and can gradually expose it more prominently when we agree that this would be a good idea. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9776) Interrupt TaskThread only while in User/Operator code
Stephan Ewen created FLINK-9776: --- Summary: Interrupt TaskThread only while in User/Operator code Key: FLINK-9776 URL: https://issues.apache.org/jira/browse/FLINK-9776 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.6.0 Upon cancellation, the task thread is periodically interrupted. This helps to pull the thread out of blocking operations in the user code. Once the thread leaves the user code, the repeated interrupts may interfere with the shutdown cleanup logic, causing confusing exceptions. We should stop sending the periodic interrupts once the thread leaves the user code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9753) Support ORC/Parquet for StreamingFileSink
Stephan Ewen created FLINK-9753: --- Summary: Support ORC/Parquet for StreamingFileSink Key: FLINK-9753 URL: https://issues.apache.org/jira/browse/FLINK-9753 Project: Flink Issue Type: Sub-task Components: Streaming Connectors Reporter: Stephan Ewen Assignee: Kostas Kloudas Fix For: 1.6.0 Formats like Parquet and ORC are great at compressing data and making it fast to scan/filter/project the data. However, these formats are only efficient, if they can columnarize and compress a significant amount of data in their columnar format. If they compress only a few rows at a time, they produce many short column vecors and are thus much less efficient. The Bucketing Sink has the requirement that data is persistent on the target FileSystem on each checkpoint. Pushing data through a Parquet or ORC encoder and flushing on each checkpoint means that for frequent checkpoints, the amount of data compressed/columnarized in a block is small. Hence, the result is an inefficiently compressed file. Making this efficient independently of the checkpoint interval would mean that the sink needs to first collect (and persist) a good amount of data and then push it through the Parquet/ORC writers. I would suggest to approach this as follows: - When writing to the "in progress files" write the raw records (TypeSerializer encoding) - When the "in progress file" is rolled over (published), the sink pushes the data through the encoder. - This is not much work on top of the new abstraction and will result in large blocksand hence in efficient compression. Alternatively, we can support directly encoding the stream to the "in progress files" via Parque/ORC, if users know that their combination of data rate and checkpoint interval will result in large enough chunks of data per checkpoint interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9752) Add an S3 RecoverableWriter
Stephan Ewen created FLINK-9752: --- Summary: Add an S3 RecoverableWriter Key: FLINK-9752 URL: https://issues.apache.org/jira/browse/FLINK-9752 Project: Flink Issue Type: Sub-task Components: Streaming Connectors Reporter: Stephan Ewen Assignee: Kostas Kloudas S3 offers persistence only when uploads are complete. That means at the end of simple uploads and uploads of parts of a MultiPartUpload. We should implement a RecoverableWriter for S3 that does a MultiPartUpload with a Part per checkpoint. Recovering the reader needs the MultiPartUploadID and the list of ETags of previous parts. We need additional staging of data in Flink state to work around the fact that - Parts in a MultiPartUpload must be at least 5MB - Part sizes must be known up front. (Note that data can still be streamed in the upload) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9751) Add a RecoverableWriter to the FileSystem abstraction
Stephan Ewen created FLINK-9751: --- Summary: Add a RecoverableWriter to the FileSystem abstraction Key: FLINK-9751 URL: https://issues.apache.org/jira/browse/FLINK-9751 Project: Flink Issue Type: Sub-task Affects Versions: 1.6.0 Reporter: Stephan Ewen Assignee: Stephan Ewen The core operation of the StreamingFileSink is to append result data to (hidden) "in progress" files and then, when the files should roll over, publish them as visible files. At each checkpoint, the data so far must be persistent in the "in progress" files. On recovery, we resume the "in progress" file at the exact position of the checkpoint, or publish up to the position of that checkpoint. In order to support various file systems and object stores, we need an interface that captures these core operations and allows for different implementations (such as file truncate/append on posix, MultiPartUpload on S3, ...) Proposed interface: {code:java} /** * A handle to an in-progress stream with a defined and persistent amount of data. * The handle can be used to recover the stream and publish the result file. */ interface CommitRecoverable { ... } /** * A handle to an in-progress stream with a defined and persistent amount of data. * The handle can be used to recover the stream and either publish the result file * or keep appending data to the stream. */ interface ResumeRecoverable extends CommitRecoverable { ... } /** * An output stream to a file system that can be recovered at well defined points. * The stream initially writes to hidden files or temp files and only creates the * target file once it is closed and "committed". */ public abstract class RecoverableFsDataOutputStream extends FSDataOutputStream { /** * Ensures all data so far is persistent (similar to {@link #sync()}) and returns * a handle to recover the stream at the current position. */ public abstract ResumeRecoverable persist() throws IOException; /** * Closes the stream, ensuring persistence of all data (similar to {@link #sync()}). * This returns a Committer that can be used to publish (make visible) the file * that the stream was writing to. */ public abstract Committer closeForCommit() throws IOException; /** * A committer can publish the file of a stream that was closed. * The Committer can be recovered via a {@link CommitRecoverable}. */ public interface Committer { void commit() throws IOException; CommitRecoverable getRecoverable(); } } /** * The RecoverableWriter creates and recovers RecoverableFsDataOutputStream. */ public interface RecoverableWriter{ RecoverableFsDataOutputStream open(Path path) throws IOException; RecoverableFsDataOutputStream recover(ResumeRecoverable resumable) throws IOException; RecoverableFsDataOutputStream.Committer recoverForCommit(CommitRecoverable resumable) throws IOException; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9750) Create new StreamingFileSink that works on Flink's FileSystem abstraction
Stephan Ewen created FLINK-9750: --- Summary: Create new StreamingFileSink that works on Flink's FileSystem abstraction Key: FLINK-9750 URL: https://issues.apache.org/jira/browse/FLINK-9750 Project: Flink Issue Type: Sub-task Components: Streaming Connectors Reporter: Stephan Ewen Assignee: Kostas Kloudas Fix For: 1.6.0 Using Flink's own file system abstraction means that we can add additional streaming/checkpointing related behavior. In addition, the new StreamingFileSink should only rely on internal checkpointed state what files are possibly in progress or need to roll over, never assume enumeration of files in the file system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9749) Rework Bucketing Sink
Stephan Ewen created FLINK-9749: --- Summary: Rework Bucketing Sink Key: FLINK-9749 URL: https://issues.apache.org/jira/browse/FLINK-9749 Project: Flink Issue Type: New Feature Components: Streaming Connectors Reporter: Stephan Ewen Assignee: Kostas Kloudas Fix For: 1.6.0 The BucketingSink has a series of deficits at the moment. Due to the long list of issues, I would suggest to add a new StreamingFileSink with a new and cleaner design h3. Encoders, Parquet, ORC - It only efficiently supports row-wise data formats (avro, jso, sequence files. - Efforts to add (columnar) compression for blocks of data is inefficient, because blocks cannot span checkpoints due to persistence-on-checkpoint. - The encoders are part of the \{{flink-connector-filesystem project}}, rather than in orthogonal formats projects. This blows up the dependencies of the \{{flink-connector-filesystem project}} project. As an example, the rolling file sink has dependencies on Hadoop and Avro, which messes up dependency management. h3. Use of FileSystems - The BucketingSink works only on Hadoop's FileSystem abstraction not support Flink's own FileSystem abstraction and cannot work with the packaged S3, maprfs, and swift file systems - The sink hence needs Hadoop as a dependency - The sink relies on "trying out" whether truncation works, which requires write access to the users working directory - The sink relies on enumerating and counting files, rather than maintaining its own state, making less efficient h3. Correctness and Efficiency on S3 - The BucketingSink relies on strong consistency in the file enumeration, hence may work incorrectly on S3. - The BucketingSink relies on persisting streams at intermediate points. This is not working properly on S3, hence there may be data loss on S3. h3. .valid-length companion file - The valid length file makes it hard for consumers of the data and should be dropped -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9428) Allow operators to flush data on checkpoint pre-barrier
Stephan Ewen created FLINK-9428: --- Summary: Allow operators to flush data on checkpoint pre-barrier Key: FLINK-9428 URL: https://issues.apache.org/jira/browse/FLINK-9428 Project: Flink Issue Type: New Feature Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.6.0 Some operators maintain some small transient state that may be inefficient to checkpoint, especially when it would need to be checkpointed also in a re-scalable way. An example are opportunistic pre-aggregation operators, which have small the pre-aggregation state that is frequently flushed downstream. Rather that persisting that state in a checkpoint, it can make sense to flush the data downstream upon a checkpoint, to let it be part of the downstream operator's state. This feature is sensitive, because flushing state has a clean implication on the downstream operator's checkpoint alignment. However, used with care, and with the new back-pressure-based checkpoint alignment, this feature can be very useful. Because it is sensitive, I suggest to make this only an internal feature (accessible to operators) and NOT expose it in the public API at this point. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9397) Individual Buffer Timeout of 0 incorrectly leads to default timeout
Stephan Ewen created FLINK-9397: --- Summary: Individual Buffer Timeout of 0 incorrectly leads to default timeout Key: FLINK-9397 URL: https://issues.apache.org/jira/browse/FLINK-9397 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 1.4.2 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.4.3 When configuring the buffer timeout of an individual operation to {{0}}, the StreamGraphGenerator incorrectly uses the default value of the application (typically 100). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9392) Add @FunctionalInterface annotations to all core functional interfaces
Stephan Ewen created FLINK-9392: --- Summary: Add @FunctionalInterface annotations to all core functional interfaces Key: FLINK-9392 URL: https://issues.apache.org/jira/browse/FLINK-9392 Project: Flink Issue Type: Improvement Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen The {{@FunctionalInterface}} annotation should be added to all SAM interfaces in order to prevent accidentally breaking them (as non SAMs). We had a case of that before for the {{SinkFunction}} which was compatible through default methods, but incompatible for users that previously instantiated that interface through a lambda. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9314) Enable SSL mutual authentication for Netty / TaskManagers
Stephan Ewen created FLINK-9314: --- Summary: Enable SSL mutual authentication for Netty / TaskManagers Key: FLINK-9314 URL: https://issues.apache.org/jira/browse/FLINK-9314 Project: Flink Issue Type: Sub-task Components: Security Reporter: Stephan Ewen Assignee: Stephan Ewen Making sure that TaskManagers authenticate both ways (client and server) requires giving access to keystore and truststore on both ends, and enabling the client authentication flag when creating the SSL Engine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9313) Enable mutual authentication for RPC (akka)
Stephan Ewen created FLINK-9313: --- Summary: Enable mutual authentication for RPC (akka) Key: FLINK-9313 URL: https://issues.apache.org/jira/browse/FLINK-9313 Project: Flink Issue Type: Sub-task Components: Distributed Coordination Reporter: Stephan Ewen Assignee: Stephan Ewen Trivial, just needs to add the respective line in the akka configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9312) Perform mutual authentication during SSL handshakes
Stephan Ewen created FLINK-9312: --- Summary: Perform mutual authentication during SSL handshakes Key: FLINK-9312 URL: https://issues.apache.org/jira/browse/FLINK-9312 Project: Flink Issue Type: New Feature Components: Security Reporter: Stephan Ewen Fix For: 1.6.0 Currently, the Flink processes encrypted connections via SSL: - Data exchange TM - TM - RPC JM - TM - Blob Service JM - TM However, the server side always accepts any client to build up the connection, meaning the connections are not strongly authenticated. Activating SSL mutual authentication solves that - only processes that have the same certificate can connect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9310) Update default cyphersuites
Stephan Ewen created FLINK-9310: --- Summary: Update default cyphersuites Key: FLINK-9310 URL: https://issues.apache.org/jira/browse/FLINK-9310 Project: Flink Issue Type: Task Components: Security Affects Versions: 1.4.2 Reporter: Stephan Ewen Assignee: Stephan Ewen The current default cipher suite {{TLS_RSA_WITH_AES_128_CBC_SHA}} is no longer recommended. RFC 7525 [1] recommends to use the following cipher suites only: * TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 * TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 * TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 [1] https://tools.ietf.org/html/rfc7525 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9292) Remove TypeInfoParser
Stephan Ewen created FLINK-9292: --- Summary: Remove TypeInfoParser Key: FLINK-9292 URL: https://issues.apache.org/jira/browse/FLINK-9292 Project: Flink Issue Type: Task Components: Core Reporter: Stephan Ewen The {{TypeInfoParser}} has been deprecated, in favor of the {{TypeHint}}. Because the TypeInfoParser is also not working correctly with respect to classloading, we should remove it. Users still find the class, try to use it, and run into problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9279) PythonPlanBinderTest flakey
Stephan Ewen created FLINK-9279: --- Summary: PythonPlanBinderTest flakey Key: FLINK-9279 URL: https://issues.apache.org/jira/browse/FLINK-9279 Project: Flink Issue Type: Bug Components: Python API, Tests Affects Versions: 1.5.0 Reporter: Stephan Ewen The test fails while trying to create the parent directory {{/tmp/flink}}. That happens if a file with that name already exists. The Python Plan binder apparently used a fix name for the temp directory, but should use a statistically unique random name instead. Full test run log: https://api.travis-ci.org/v3/job/373120733/log.txt Relevant Stack Trace {code} Job execution failed. org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:898) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:841) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:841) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.io.IOException: Mkdirs failed to create /tmp/flink at org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:271) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:121) at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248) at org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:161) at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:202) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:748) Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 59.839 sec <<< FAILURE! - in org.apache.flink.python.api.PythonPlanBinderTest testJobWithoutObjectReuse(org.apache.flink.python.api.PythonPlanBinderTest) Time elapsed: 14.912 sec <<< FAILURE! java.lang.AssertionError: Error while calling the test program: Job execution failed. at org.junit.Assert.fail(Assert.java:88) at org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:161) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:2
[jira] [Created] (FLINK-9277) Reduce noisiness of SlotPool logging
Stephan Ewen created FLINK-9277: --- Summary: Reduce noisiness of SlotPool logging Key: FLINK-9277 URL: https://issues.apache.org/jira/browse/FLINK-9277 Project: Flink Issue Type: Improvement Components: Distributed Coordination Affects Versions: 1.5.0 Reporter: Stephan Ewen Assignee: Till Rohrmann The slot pool logs a vary large amount of stack traces with meaningless exceptions like {code} org.apache.flink.util.FlinkException: Release multi task slot because all children have been released. {code} This makes log parsing very hard. For an example, see this log: https://gist.githubusercontent.com/GJL/3b109db48734ff40103f47d04fc54bd3/raw/e3afc0ec3f452bad681e388016bcf799bba56f10/gistfile1.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9276) Improve error message when TaskManager fails
Stephan Ewen created FLINK-9276: --- Summary: Improve error message when TaskManager fails Key: FLINK-9276 URL: https://issues.apache.org/jira/browse/FLINK-9276 Project: Flink Issue Type: Improvement Components: Distributed Coordination Affects Versions: 1.5.0 Reporter: Stephan Ewen When a TaskManager fails, we frequently get a message {code} org.apache.flink.util.FlinkException: Releasing TaskManager container_1524853016208_0001_01_000102 {code} This message is misleading in that it sounds like an intended operation, when it really is a failure of a container that the {{ResourceManager}} reports to the {{JobManager}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9198) Improve error messages in AbstractDeserializationSchema for type extraction
Stephan Ewen created FLINK-9198: --- Summary: Improve error messages in AbstractDeserializationSchema for type extraction Key: FLINK-9198 URL: https://issues.apache.org/jira/browse/FLINK-9198 Project: Flink Issue Type: Improvement Affects Versions: 1.4.2 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 User feedback: When type extraction fails in the {{AbstractDeserializationSchema}}, the error message does not explain fully how to fix this. I suggest to improve the error message and add some convenience constructors to directly pass TypeInformation when needed. We can also simplify the class a bit, because TypeInformation needs no longer be dropped during serialization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9197) Improve error message for TypyInformation and TypeHint with generics
Stephan Ewen created FLINK-9197: --- Summary: Improve error message for TypyInformation and TypeHint with generics Key: FLINK-9197 URL: https://issues.apache.org/jira/browse/FLINK-9197 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.4.2 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 User feedback: When using a {{TypeHint}} with a generic type variable, the error message could be better. Similarly, when using {{TypeInformation.of(Tuple2.class)}}, the error message should refer the user to the TypeHint method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9192) Undo parameterization of StateMachine Example
Stephan Ewen created FLINK-9192: --- Summary: Undo parameterization of StateMachine Example Key: FLINK-9192 URL: https://issues.apache.org/jira/browse/FLINK-9192 Project: Flink Issue Type: Improvement Reporter: Stephan Ewen The example has been changed to add parametrization and a different sink. I would vote to undo these changes, the make the example less nice and use non-recommended sinks: - For state backend, incremental checkpoints, async checkpoints, etc. having these settings in the example blows up the parameter list of the example and distracts from what the example is about. - If the main reason for this is an end-to-end test, then these settings should go into the test's Flink config. - The {{writeAsText}} is a sink that is not recommended to use, because it is not integrated with checkpoints and has no well defined semantics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9189) Add a SBT and Gradle Quickstarts
Stephan Ewen created FLINK-9189: --- Summary: Add a SBT and Gradle Quickstarts Key: FLINK-9189 URL: https://issues.apache.org/jira/browse/FLINK-9189 Project: Flink Issue Type: Improvement Components: Quickstarts Reporter: Stephan Ewen Having a proper project template helps a lot in getting dependencies right. For example, setting the core dependencies to "provided", the connector / library dependencies to "compile", etc. The Maven quickstarts are in good shape by now, but I observed SBT and Gradle users to get this wrong quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9098) ClassLoaderITCase unstable
Stephan Ewen created FLINK-9098: --- Summary: ClassLoaderITCase unstable Key: FLINK-9098 URL: https://issues.apache.org/jira/browse/FLINK-9098 Project: Flink Issue Type: Bug Components: Tests Reporter: Stephan Ewen Fix For: 1.5.0 The some savepoint disposal seems to fail, after that all successive tests fail because there are not anymore enough slots. Full log: https://api.travis-ci.org/v3/job/356900367/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9045) LocalEnvironment with web UI does not work with flip-6
Stephan Ewen created FLINK-9045: --- Summary: LocalEnvironment with web UI does not work with flip-6 Key: FLINK-9045 URL: https://issues.apache.org/jira/browse/FLINK-9045 Project: Flink Issue Type: Bug Components: Webfrontend Reporter: Stephan Ewen Fix For: 1.5.0 The following code is supposed to start a web UI when executing in-IDE. Does not work with flip-6, as far as I can see. {code} final ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration()); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9037) Test flake Kafka09ITCase#testCancelingEmptyTopic
Stephan Ewen created FLINK-9037: --- Summary: Test flake Kafka09ITCase#testCancelingEmptyTopic Key: FLINK-9037 URL: https://issues.apache.org/jira/browse/FLINK-9037 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.5.0 Reporter: Stephan Ewen {code} Test testCancelingEmptyTopic(org.apache.flink.streaming.connectors.kafka.Kafka09ITCase) failed with: org.junit.runners.model.TestTimedOutException: test timed out after 6 milliseconds {code} Full log: https://api.travis-ci.org/v3/job/356044885/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9036) Add default value via suppliers
Stephan Ewen created FLINK-9036: --- Summary: Add default value via suppliers Key: FLINK-9036 URL: https://issues.apache.org/jira/browse/FLINK-9036 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.6.0 Earlier versions had a default value in {{ValueState}}. We dropped that, because the value would have to be duplicated on each access, to be safe against side effects when using mutable types. For convenience, we should re-add the feature, but using a supplier/factory function to create the default value on access. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9035) State Descriptors have broken hashCode() and equals()
Stephan Ewen created FLINK-9035: --- Summary: State Descriptors have broken hashCode() and equals() Key: FLINK-9035 URL: https://issues.apache.org/jira/browse/FLINK-9035 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.4.2, 1.5.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.6.0 The following code fails with a {{NullPointerException}}: {code} ValueStateDescriptor descr = new ValueStateDescriptor<>("name", String.class); descr.hashCode(); {code} The {{hashCode()}} function tries to access the {{serializer}} field, which may be uninitialized at that point. The {{equals()}} method is equally broken (no pun intended): {code} ValueStateDescriptor a = new ValueStateDescriptor<>("name", String.class); ValueStateDescriptor b = new ValueStateDescriptor<>("name", String.class); a.equals(b) // exception b.equals(a) // exception a.initializeSerializerUnlessSet(new ExecutionConfig()); a.equals(b) // false b.equals(a) // exception b.initializeSerializerUnlessSet(new ExecutionConfig()); a.equals(b) // true b.equals(a) // true {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9034) State Descriptors drop TypeInformation on serialization
Stephan Ewen created FLINK-9034: --- Summary: State Descriptors drop TypeInformation on serialization Key: FLINK-9034 URL: https://issues.apache.org/jira/browse/FLINK-9034 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.4.2, 1.5.0 Reporter: Stephan Ewen Fix For: 1.6.0 The following code currently causes problems {code} public class MyFunction extends RichMapFunction { private final ValueStateDescriptor descr = new ValueStateDescriptor<>("state name", MyType.class); private ValueState state; @Override public void open() { state = getRuntimeContext().getValueState(descr); } } {code} The problem is that the state descriptor drops the type information and creates a serializer before serialization as part of shipping the function in the cluster. To do that, it initializes the serializer with an empty execution config, making serialization inconsistent. This is mainly an artifact from the days when dropping the type information before shipping was necessary, because the type info was not serializable. It now is, and we can fix that bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8885) The DispatcherThreadFactory should register uncaught exception handlers
Stephan Ewen created FLINK-8885: --- Summary: The DispatcherThreadFactory should register uncaught exception handlers Key: FLINK-8885 URL: https://issues.apache.org/jira/browse/FLINK-8885 Project: Flink Issue Type: Bug Components: TaskManager Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 The {{DispatcherThreadFactory}} is responsible for spawning the thread pool threads for TaskManager's async dispatcher and for the CheckpointCoordinators timed trigger. In case of uncaught exceptions in these threads, the system is not healthy and more, hence they should register the {{FatalExitUcaughtExceptionsHandler}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8884) The DispatcherThreadFactory should register uncaught exception handlers
Stephan Ewen created FLINK-8884: --- Summary: The DispatcherThreadFactory should register uncaught exception handlers Key: FLINK-8884 URL: https://issues.apache.org/jira/browse/FLINK-8884 Project: Flink Issue Type: Bug Components: TaskManager Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 The {{DispatcherThreadFactory}} is responsible for spawning the thread pool threads for TaskManager's async dispatcher and for the CheckpointCoordinators timed trigger. In case of uncaught exceptions in these threads, the system is not healthy and more, hence they should register the {{FatalExitUcaughtExceptionsHandler}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8883) ExceptionUtils.rethrowIfFatalError should tread ThreadDeath as fatal.
Stephan Ewen created FLINK-8883: --- Summary: ExceptionUtils.rethrowIfFatalError should tread ThreadDeath as fatal. Key: FLINK-8883 URL: https://issues.apache.org/jira/browse/FLINK-8883 Project: Flink Issue Type: Bug Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 Thread deaths leave code in inconsistent state and should thus always be forwarded as fatal exceptions that cannot be handled in any way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8879) Add concurrent access check to AvroSerializer
Stephan Ewen created FLINK-8879: --- Summary: Add concurrent access check to AvroSerializer Key: FLINK-8879 URL: https://issues.apache.org/jira/browse/FLINK-8879 Project: Flink Issue Type: Sub-task Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 On debug log level and during tests, the AvroSerializer should check whether it is concurrently accessed, and throw an exception in that case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8878) Check for concurrent access to Kryo Serializer
Stephan Ewen created FLINK-8878: --- Summary: Check for concurrent access to Kryo Serializer Key: FLINK-8878 URL: https://issues.apache.org/jira/browse/FLINK-8878 Project: Flink Issue Type: Sub-task Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 On debug log level and during tests, the {{KryoSerializer}} should check whether it is concurrently accessed, and throw an exception in that case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8877) Configure Kryo's log level based on Flink's log level
Stephan Ewen created FLINK-8877: --- Summary: Configure Kryo's log level based on Flink's log level Key: FLINK-8877 URL: https://issues.apache.org/jira/browse/FLINK-8877 Project: Flink Issue Type: Sub-task Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 Kryo uses its embedded MinLog for logging. When Flink is set to trace, Kryo should be set to trace as well. Other log levels should not be uses, as even debug logging in Kryo results in excessive logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8876) Improve concurrent access handling in stateful serializers
Stephan Ewen created FLINK-8876: --- Summary: Improve concurrent access handling in stateful serializers Key: FLINK-8876 URL: https://issues.apache.org/jira/browse/FLINK-8876 Project: Flink Issue Type: Improvement Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.6.0 Some stateful serializers produce incorrect results when accidentally accessed by multiple threads concurrently. Â To better catch these cases, I suggest to add concurrency checks that are active only when debug logging is enabled, and during test runs. This is inspired by Kryo's checks for concurrent access. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8856) Move all interrupt() calls to TaskCanceler
Stephan Ewen created FLINK-8856: --- Summary: Move all interrupt() calls to TaskCanceler Key: FLINK-8856 URL: https://issues.apache.org/jira/browse/FLINK-8856 Project: Flink Issue Type: Bug Components: TaskManager Reporter: Stephan Ewen Fix For: 1.5.0 We need this to work around the following JVM bug: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8138622 To circumvent this problem, the {{TaskCancelerWatchDog}} must not call {{interrupt()}} at all, but only join on the executing thread (with timeout) and cause a hard exit once cancellation takes to long. A user affected by this problem reported this in FLINK-8834 Personal note: The Thread.join(...) method unfortunately is not 100% reliable as well, because it uses {{System.currentTimeMillis()}} rather than {{System.nanoTime()}}. Because of that, sleeps can take overly long when the clock is adjusted. I wonder why the JDK authors do not follow their own recommendations and use {{System.nanoTime()}} for all relative time measures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8837) Move DataStreamUtils to package 'experimental'.
Stephan Ewen created FLINK-8837: --- Summary: Move DataStreamUtils to package 'experimental'. Key: FLINK-8837 URL: https://issues.apache.org/jira/browse/FLINK-8837 Project: Flink Issue Type: Bug Components: Streaming Reporter: Stephan Ewen Fix For: 1.5.0 The class {{DataStreamUtils}}Â came from 'flink-contrib' and now accidentally moved to the fully supported API packages. It should be in package 'experimental' to properly communicate that it is not guaranteed to be API stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8835) Fix TaskManager config keys
Stephan Ewen created FLINK-8835: --- Summary: Fix TaskManager config keys Key: FLINK-8835 URL: https://issues.apache.org/jira/browse/FLINK-8835 Project: Flink Issue Type: Bug Components: TaskManager Reporter: Stephan Ewen Fix For: 1.5.0 Many new config keys in the TaskManager don't follow the proper naming scheme. We need to clear those up before the release. I would also suggest to keep the key names short, because that makes it easier for users. When doing this cleanup pass over the config keys, I would suggest to also make some of the existing keys more hierarchical harmonize them with the common scheme in Flink. ## New Keys * {{taskmanager.network.credit-based-flow-control.enabled}} to {{taskmanager.network.credit-model}}. * {{taskmanager.exactly-once.blocking.data.enabled}} to {{task.checkpoint.alignment.blocking}} (we already have {{task.checkpoint.alignment.max-size}}) ## Existing Keys * {{taskmanager.debug.memory.startLogThread}} => {{taskmanager.debug.memory.log}} * {{taskmanager.debug.memory.logIntervalMs}} => {{taskmanager.debug.memory.log-interval}} * {{taskmanager.initial-registration-pause}} => {{taskmanager.registration.initial-backoff}} * {{taskmanager.max-registration-pause}} => {{taskmanager.registration.max-backoff}} * {{taskmanager.refused-registration-pause}} {{taskmanager.registration.refused-backoff}} * {{taskmanager.maxRegistrationDuration}} ==> * {{taskmanager.registration.timeout}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalClassName()' with 'getClassName()'
Stephan Ewen created FLINK-8824: --- Summary: In Kafka Consumers, replace 'getCanonicalClassName()' with 'getClassName()' Key: FLINK-8824 URL: https://issues.apache.org/jira/browse/FLINK-8824 Project: Flink Issue Type: Bug Components: Kafka Connector Reporter: Stephan Ewen Fix For: 1.5.0 The connector uses {{getCanonicalClassName()}} in all places, gather than {{getClassName()}}. {{getCanonicalClassName()}}'s intention is to normalize class names for arrays, etc, but is problematic when instantiating classes from class names. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8806) Failure in UnionInputGate getNextBufferOrEvent()
Stephan Ewen created FLINK-8806: --- Summary: Failure in UnionInputGate getNextBufferOrEvent() Key: FLINK-8806 URL: https://issues.apache.org/jira/browse/FLINK-8806 Project: Flink Issue Type: Bug Components: Network Affects Versions: 1.5.0, 1.6.0 Reporter: Stephan Ewen Fix For: 1.5.0, 1.6.0 Error occurs in {{SelfConnectionITCase}}: Full log: https://api.travis-ci.org/v3/job/346847455/log.txt Exception Stack Trace {code} org.apache.flink.runtime.client.JobExecutionException: java.lang.IllegalStateException at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:527) at org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:79) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1510) at org.apache.flink.test.streaming.runtime.SelfConnectionITCase.differentDataStreamDifferentChain(SelfConnectionITCase.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:108) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:78) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:54) at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:144) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:173) at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94) at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:273) at org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:307) at org.apache.fl
[jira] [Created] (FLINK-8803) Mini Cluster Shutdown with HA unstable, causing test failures
Stephan Ewen created FLINK-8803: --- Summary: Mini Cluster Shutdown with HA unstable, causing test failures Key: FLINK-8803 URL: https://issues.apache.org/jira/browse/FLINK-8803 Project: Flink Issue Type: Bug Components: Tests Reporter: Stephan Ewen When the Mini Cluster is created for HA tests with ZooKeeper, the shutdown is unstable. It looks like ZooKeeper may be shut down before the JobManager is shut down, causing the shutdown procedure of the JobManager (specifically {{ZooKeeperSubmittedJobGraphStore.removeJobGraph}}) to block until tests time out. Full log: https://api.travis-ci.org/v3/job/346853707/log.txt Note that no ZK threads are alive any more, seems ZK is shut down already. Relevant Stack Traces: {code} "main" #1 prio=5 os_prio=0 tid=0x7f973800a800 nid=0x43b4 waiting on condition [0x7f973eb0b000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x8966cf18> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:212) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157) at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.ready(package.scala:169) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.startInternalShutdown(FlinkMiniCluster.scala:469) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:435) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.closeAsync(FlinkMiniCluster.scala:719) at org.apache.flink.test.util.MiniClusterResource.after(MiniClusterResource.java:104) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50) ... {code} {code} "flink-akka.actor.default-dispatcher-2" #1012 prio=5 os_prio=0 tid=0x7f97394fa800 nid=0x3328 waiting on condition [0x7f971db29000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x87f82a70> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:336) at org.apache.flink.shaded.curator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35) at org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.release(ZooKeeperStateHandleStore.java:478) at org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:435) at org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:405) at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.removeJobGraph(ZooKeeperSubmittedJobGraphStore.java:266) - locked <0x807f4258> (a java.lang.Object) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$1.apply$mcV$sp(JobManager.scala:1727) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$1.apply(JobManager.scala:1723) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$1.apply(JobManager.scala:1723) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Futu
[jira] [Created] (FLINK-8800) Set Logging to TRACE for org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler
Stephan Ewen created FLINK-8800: --- Summary: Set Logging to TRACE for org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler Key: FLINK-8800 URL: https://issues.apache.org/jira/browse/FLINK-8800 Project: Flink Issue Type: Bug Components: REST Reporter: Stephan Ewen Fix For: 1.5.0, 1.6.0 When setting the log level to {{DEBUG}}, the logs are swamped with statements as below, making it hard to read the debug logs. {code} 2018-02-22 13:41:04,016 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/ded95c643b42f31cf882a8986207fd30/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,048 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/eec5890dac9c38f66954443809beb5b0/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,052 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a964ee72788c82cb7d15e352d9a94f6/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,079 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/1d9c83f6e1879fdbe461aafac16eb8a5/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,085 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/4063620891a151092c5bcedb218870a6/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,094 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a751c66e0e32aee2cd8120a1a72a4d6/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,142 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/37ecc85b429bd08d0fd539532055e117/metrics?get=0.currentLowWatermark. 2018-02-22 13:41:04,173 DEBUG org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler - Received request /jobs/ec1c9d7a3c413a9523656efa58735009/vertices/20e20298680571979f690d36d1a6db36/metrics?get=0.currentLowWatermark. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8798) Make commons-logging a parent-first pattern
Stephan Ewen created FLINK-8798: --- Summary: Make commons-logging a parent-first pattern Key: FLINK-8798 URL: https://issues.apache.org/jira/browse/FLINK-8798 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.4.1 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.4.2, 1.6.0 The Apache {{commons-logging}} framework does not play well with child-first classloading. We need to make this a parent-first pattern. As a matter of fact, other frameworks that use inverted classloading (JBoss, Tomcat) use force this library to be always parent-first as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8791) Fix documentation on how to link dependencies
Stephan Ewen created FLINK-8791: --- Summary: Fix documentation on how to link dependencies Key: FLINK-8791 URL: https://issues.apache.org/jira/browse/FLINK-8791 Project: Flink Issue Type: Bug Components: Documentation Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The documentation in "Linking with Flink" and "Linking with Optional Dependencies" is very outdated and gives wrong advise to users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8767) Set the maven.compiler.source and .target properties for Java Quickstart
Stephan Ewen created FLINK-8767: --- Summary: Set the maven.compiler.source and .target properties for Java Quickstart Key: FLINK-8767 URL: https://issues.apache.org/jira/browse/FLINK-8767 Project: Flink Issue Type: Sub-task Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Setting these properties helps properly pinning the Java version in IntelliJ. Without these properties, Java version keeps switching back to 1.5 in some setups. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8765) Simplify quickstart properties
Stephan Ewen created FLINK-8765: --- Summary: Simplify quickstart properties Key: FLINK-8765 URL: https://issues.apache.org/jira/browse/FLINK-8765 Project: Flink Issue Type: Sub-task Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 This does not pull out the slf4j and log4j version into properties any more, making the quickstarts a bit simpler. Given that both versions are used only once, and only for the feature to have convenience logging in the IDE, the versions might as well be defined directly in the dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8766) Pin scala runtime version for Java Quickstart
Stephan Ewen created FLINK-8766: --- Summary: Pin scala runtime version for Java Quickstart Key: FLINK-8766 URL: https://issues.apache.org/jira/browse/FLINK-8766 Project: Flink Issue Type: Sub-task Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Followup to FLINK-7414, which pinned the scala version for the Scala Quickstart -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8764) Make quickstarts work out of the box for IDE and JAR packaging
Stephan Ewen created FLINK-8764: --- Summary: Make quickstarts work out of the box for IDE and JAR packaging Key: FLINK-8764 URL: https://issues.apache.org/jira/browse/FLINK-8764 Project: Flink Issue Type: Sub-task Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 We can make the quickstarts work for IntelliJ, Eclipse, and Maven Jar packaging out of the box, without the need to pass a profile name during jar packaging via the following trick: - All Flink and Scala dependencies are properly set to provided - That way, Maven JAR packaging behaves correctly by default - Eclipse adds 'provided' dependencies to the classpath when running programs, so works out of the box - There is a profile that automatically activates in IntelliJ that adds the necessary dependencies in 'compile' scope to make it run out of the box. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8762) Remove unnecessary examples and make "StreamingJob" the default
Stephan Ewen created FLINK-8762: --- Summary: Remove unnecessary examples and make "StreamingJob" the default Key: FLINK-8762 URL: https://issues.apache.org/jira/browse/FLINK-8762 Project: Flink Issue Type: Sub-task Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The packaged WordCount example jobs have been reported to not be terribly helpful and simply create noise in the initial project setup. In addition, setting the main class by default to {{StreamingJob}} creates a better out of the box experience for the majority of the users. We prominently document how to adjust this to use {{BatchJob}} as the main class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8763) Remove obsolete Dummy.java classes from quickstart projects.
Stephan Ewen created FLINK-8763: --- Summary: Remove obsolete Dummy.java classes from quickstart projects. Key: FLINK-8763 URL: https://issues.apache.org/jira/browse/FLINK-8763 Project: Flink Issue Type: Sub-task Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 These classes seem no longer necessary, the project JavaDocs build properly without those classes being present. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8761) Various improvements to the Quickstarts
Stephan Ewen created FLINK-8761: --- Summary: Various improvements to the Quickstarts Key: FLINK-8761 URL: https://issues.apache.org/jira/browse/FLINK-8761 Project: Flink Issue Type: Improvement Components: Quickstarts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Various improvements to the Quickstarts to give a smoother out of the box experience. Broken down into the subtasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8738) Converge runtime dependency versions for 'scala-lang' and for 'com.typesafe:config'
Stephan Ewen created FLINK-8738: --- Summary: Converge runtime dependency versions for 'scala-lang' and for 'com.typesafe:config' Key: FLINK-8738 URL: https://issues.apache.org/jira/browse/FLINK-8738 Project: Flink Issue Type: Bug Components: Build System Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 These dependencies are currently diverged: {code} Dependency convergence error for com.typesafe:config:1.3.0 paths to dependency are: +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-actor_2.11:2.4.20 +-com.typesafe:config:1.3.0 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-stream_2.11:2.4.20 +-com.typesafe:ssl-config-core_2.11:0.2.1 +-com.typesafe:config:1.2.0 {code} and {code} Dependency convergence error for org.scala-lang:scala-library:2.11.12 paths to dependency are: +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-org.scala-lang:scala-library:2.11.12 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-actor_2.11:2.4.20 +-org.scala-lang:scala-library:2.11.11 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-actor_2.11:2.4.20 +-org.scala-lang.modules:scala-java8-compat_2.11:0.7.0 +-org.scala-lang:scala-library:2.11.7 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-stream_2.11:2.4.20 +-org.scala-lang:scala-library:2.11.11 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-stream_2.11:2.4.20 +-com.typesafe:ssl-config-core_2.11:0.2.1 +-org.scala-lang:scala-library:2.11.8 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-stream_2.11:2.4.20 +-com.typesafe:ssl-config-core_2.11:0.2.1 +-org.scala-lang.modules:scala-parser-combinators_2.11:1.0.4 +-org.scala-lang:scala-library:2.11.6 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-protobuf_2.11:2.4.20 +-org.scala-lang:scala-library:2.11.11 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.typesafe.akka:akka-slf4j_2.11:2.4.20 +-org.scala-lang:scala-library:2.11.11 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-org.clapper:grizzled-slf4j_2.11:1.0.2 +-org.scala-lang:scala-library:2.11.0 and +-com.daplatform.flink:txn-api:1.0-SNAPSHOT +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT +-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT +-com.twitter:chill_2.11:0.7.4 +-org.scala-lang:scala-library:2.11.7 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8729) Migrate JSONGenerator from Slink to Jackson
Stephan Ewen created FLINK-8729: --- Summary: Migrate JSONGenerator from Slink to Jackson Key: FLINK-8729 URL: https://issues.apache.org/jira/browse/FLINK-8729 Project: Flink Issue Type: Bug Components: Streaming Reporter: Stephan Ewen The {{org.apache.flink.streaming.api.graph.JSONGenerator}} uses Slink for JSON encoding, adding an extra dependency. All other Flink parts use a specially shaded Jackson dependency. Migrating the JSONGenerator would allow us to drop a dependency and make the code more homogeneous. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8727) Test instability in SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager
Stephan Ewen created FLINK-8727: --- Summary: Test instability in SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager Key: FLINK-8727 URL: https://issues.apache.org/jira/browse/FLINK-8727 Project: Flink Issue Type: Bug Components: Tests Reporter: Stephan Ewen Travis build and logs: https://api.travis-ci.org/v3/job/344253865/log.txt [~till.rohrmann] FYI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8696) Remove JobManager local mode from the Unix Shell Scripts
Stephan Ewen created FLINK-8696: --- Summary: Remove JobManager local mode from the Unix Shell Scripts Key: FLINK-8696 URL: https://issues.apache.org/jira/browse/FLINK-8696 Project: Flink Issue Type: Sub-task Components: Startup Shell Scripts Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 In order to work towards removing the local JobManager mode, the shell scripts need to be changed to not use/assume that mode any more -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8695) Move RocksDB State Backend from 'flink-contrib' to 'flink-state-backends'
Stephan Ewen created FLINK-8695: --- Summary: Move RocksDB State Backend from 'flink-contrib' to 'flink-state-backends' Key: FLINK-8695 URL: https://issues.apache.org/jira/browse/FLINK-8695 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Having the RocksDB State Backend in {{flink-contrib}} is a bit of an understatement... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8682) Make start/stop cluster scripts work without SSH for local HA setups
Stephan Ewen created FLINK-8682: --- Summary: Make start/stop cluster scripts work without SSH for local HA setups Key: FLINK-8682 URL: https://issues.apache.org/jira/browse/FLINK-8682 Project: Flink Issue Type: Improvement Components: Startup Shell Scripts Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The startup should work for purely local (testing) cluster without setups without SSH. While the shell scripts handle this correctly for TaskManagers, they don't handle it correctly for JobManagers. As a consequence, {{start-cluster.sh}} does not work without SSH when high availability is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8681) Remove planVisualizer.html move notice
Stephan Ewen created FLINK-8681: --- Summary: Remove planVisualizer.html move notice Key: FLINK-8681 URL: https://issues.apache.org/jira/browse/FLINK-8681 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The {{planVisualizer.html}} for optimizer plans is no longer in the Flink distribution, but we hold a notice there that the visualizer has moved to the website. That notice has been there for many versions (since Flink 1.0) and can be removed now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8680) Name printing sinks by default.
Stephan Ewen created FLINK-8680: --- Summary: Name printing sinks by default. Key: FLINK-8680 URL: https://issues.apache.org/jira/browse/FLINK-8680 Project: Flink Issue Type: Improvement Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The sinks that pring to std. out and std. err show up as "Sink: Unnamed" in logs and the UI. They should be named "Print to Std. Out" and "Print to Std. Err" by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8549) Move TimerServiceOptions to TaskManagerOptions
Stephan Ewen created FLINK-8549: --- Summary: Move TimerServiceOptions to TaskManagerOptions Key: FLINK-8549 URL: https://issues.apache.org/jira/browse/FLINK-8549 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The {{TimerServiceOptions}} are in the wrong place (prohibit generation of config docs) and cause over-fragmentation of the options in the code base. I propose to simple move the one option from that class to the {{TaskManagerOptions}}, as it relates to task execution. Other shutdown related options are in there already. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8548) Add Streaming State Machine Example
Stephan Ewen created FLINK-8548: --- Summary: Add Streaming State Machine Example Key: FLINK-8548 URL: https://issues.apache.org/jira/browse/FLINK-8548 Project: Flink Issue Type: Sub-task Components: Examples Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Add the example from https://github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine to the Flink examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8540) FileStateHandles must not attempt to delete their parent directory.
Stephan Ewen created FLINK-8540: --- Summary: FileStateHandles must not attempt to delete their parent directory. Key: FLINK-8540 URL: https://issues.apache.org/jira/browse/FLINK-8540 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Currently, every file disposal checks if the parent directory is now empty, and deletes it if that is the case. That is not only inefficient, but prohibitively expensive on some systems, like Amazon S3. With the resolution of [FLINK-8539], this will no longer be necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8539) Introduce "CompletedCheckpointStorageLocation" to explicitly handle disposal of checkpoint storage locations
Stephan Ewen created FLINK-8539: --- Summary: Introduce "CompletedCheckpointStorageLocation" to explicitly handle disposal of checkpoint storage locations Key: FLINK-8539 URL: https://issues.apache.org/jira/browse/FLINK-8539 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The storage location of completed checkpoints misses a proper representation. Because of that, there is no place that can handle the deletion of a checkpoint directory, or the dropping of a checkpoint specific table. Current workaround for file systems is, for example, that every file disposal checks if the parent directory is now empty, and deletes it if that is the case. That is not only inefficient, but prohibitively expensive on some systems, like Amazon S3. Properly representing the storage location for completed checkpoints allows us to add a disposal call for that location. That {{CompletedCheckpointStorageLocation}} can also be used to capture "external pointers", metadata, and even allow us to use custom serialization and deserialization of the metadata in the future, making the abstraction more extensible by allowing users to introduce new types of state handles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8531) Support separation of "Exclusive", "Shared" and "Task owned" state
Stephan Ewen created FLINK-8531: --- Summary: Support separation of "Exclusive", "Shared" and "Task owned" state Key: FLINK-8531 URL: https://issues.apache.org/jira/browse/FLINK-8531 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 Currently, all state created at a certain checkpoint goes into the directory {{chk-id}}. With incremental checkpointing, some state is shared across checkpoint and is referenced by newer checkpoints. That way, old {{chk-id}} directories stay around, containing some shared chunks. That makes it both for users and cleanup hooks hard to determine when a {{chk-x}} directory could be deleted. The same holds for state that can only every be dropped by certain operators on the TaskManager, never by the JobManager / CheckpointCoordinator. Examples of that state are write ahead logs, which need to be retained until the move to the target system is complete, which may in some cases be later then when the checkpoint that created them is disposed. I propose to introduce different scopes for tasks: - **EXCLUSIVE** is for state that belongs to one checkpoint only - **SHARED** is for state that is possibly part of multiple checkpoints - **TASKOWNED** is for state that must never by dropped by the JobManager. For file based checkpoint targets, I propose that we have the following directory layout: {code} /user-defined-checkpoint-dir | + --shared/ + --taskowned/ + --chk-1/ + --chk-2/ + --chk-3/ ... {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8499) Kryo must not be child-first loaded
Stephan Ewen created FLINK-8499: --- Summary: Kryo must not be child-first loaded Key: FLINK-8499 URL: https://issues.apache.org/jira/browse/FLINK-8499 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.4.1 Kryo classes are part of Flink API and hence move between Flink's core (serializers) and the user-application (Avro-Kryo-utils). Duplicating the kryo dependency through reversed classloading yields problems. If Kryo is in the user application jar, together with Avro, the following error happens: (this seems a semi-bug in the JVM, because this should clearly be a {{ClassCastException}}, not such a cryptic byte code error). {code} java.lang.VerifyError: Bad type on operand stack Exception Details: Location: org/apache/flink/formats/avro/utils/AvroKryoSerializerUtils.addAvroGenericDataArrayRegistration(Ljava/util/LinkedHashMap;)V @23: invokespecial Reason: Type 'org/apache/flink/api/java/typeutils/runtime/kryo/Serializers$SpecificInstanceCollectionSerializerForArrayList' (current frame, stack[7]) is not assignable to 'com/esotericsoftware/kryo/Serializer' Current Frame: bci: @23 flags: { } locals: { 'org/apache/flink/formats/avro/utils/AvroKryoSerializerUtils', 'java/util/LinkedHashMap' } stack: { 'java/util/LinkedHashMap', 'java/lang/String', uninitialized 6, uninitialized 6, 'java/lang/Class', uninitialized 12, uninitialized 12, 'org/apache/flink/api/java/typeutils/runtime/kryo/Serializers$SpecificInstanceCollectionSerializerForArrayList' } Bytecode: 0x000: 2b12 05b6 000b bb00 0c59 1205 bb00 0d59 0x010: bb00 0659 b700 0eb7 000f b700 10b6 0011 0x020: 57b1 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8461) Wrong logger configurations for shaded Netty
Stephan Ewen created FLINK-8461: --- Summary: Wrong logger configurations for shaded Netty Key: FLINK-8461 URL: https://issues.apache.org/jira/browse/FLINK-8461 Project: Flink Issue Type: Bug Components: Logging Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.4.1 We started shading Akka's Netty in Flink 1.4. The logger configurations (log4j.properties, logback.xml) were not updated to the shaded class names. One result of this is incorrect/misleading error logging of the Netty handlers during shutdown, which pollute the logs and cause Yarn end-to-end tests to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8374) Unstable Yarn tests due to Akka Shutdown Exception Logging
Stephan Ewen created FLINK-8374: --- Summary: Unstable Yarn tests due to Akka Shutdown Exception Logging Key: FLINK-8374 URL: https://issues.apache.org/jira/browse/FLINK-8374 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Critical Fix For: 1.5.0 Akka may log the following in some cases during shutdown: {{java.util.concurrent.RejectedExecutionException: Worker has already been shutdown}} The Yarn tests search the logs for unexpected exceptions and fail when encountering that exception. We should whitelist it, as it is not a problem, merely an Akka shutdown artifact. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8373) Inconsistencies in some FileSystem directory functions
Stephan Ewen created FLINK-8373: --- Summary: Inconsistencies in some FileSystem directory functions Key: FLINK-8373 URL: https://issues.apache.org/jira/browse/FLINK-8373 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Minor Fix For: 1.5.0 There are some minor differences in the behaviors of some File System functions, like {{mkdirs()}}. On some filesystems, it tolerates existing directories or files in place of parent directories. Some return false in an error case, some throw an exception. I encountered this during writing tests for the file basted state backends. We should harmonize the behavior of {{FileSystem.mkdirs()}}. I suggest to adopt the behavior that is used by HDFS, which seems the most correct one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8268) Test instability for 'TwoPhaseCommitSinkFunctionTest'
Stephan Ewen created FLINK-8268: --- Summary: Test instability for 'TwoPhaseCommitSinkFunctionTest' Key: FLINK-8268 URL: https://issues.apache.org/jira/browse/FLINK-8268 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.5.0 Reporter: Stephan Ewen Priority: Critical The following exception / failure message occurs. {code} Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.824 sec <<< FAILURE! - in org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest testIgnoreCommitExceptionDuringRecovery(org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest) Time elapsed: 0.068 sec <<< ERROR! java.lang.Exception: Could not complete snapshot 0 for operator MockTask (1/1). at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at java.io.BufferedWriter.flush(BufferedWriter.java:254) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest$FileBasedSinkFunction.preCommit(TwoPhaseCommitSinkFunctionTest.java:313) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest$FileBasedSinkFunction.preCommit(TwoPhaseCommitSinkFunctionTest.java:288) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:290) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:357) at org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshot(AbstractStreamOperatorTestHarness.java:459) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest.testIgnoreCommitExceptionDuringRecovery(TwoPhaseCommitSinkFunctionTest.java:208) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8264) Add Scala to the parent-first loading patterns
Stephan Ewen created FLINK-8264: --- Summary: Add Scala to the parent-first loading patterns Key: FLINK-8264 URL: https://issues.apache.org/jira/browse/FLINK-8264 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0, 1.4.1 A confusing experience happens when users accidentally package the Scala Library into their jar file. The reversed class loading duplicates Scala's classes, leading to exceptions like the one below. By adding {{scala.}} to the default 'parent-first-patterns' we can improve the user experience in such situations. Exception Stack Trace: {code} java.lang.ClassCastException: cannot assign instance of org.peopleinmotion.TestFunction$$anonfun$1 to field org.apache.flink.streaming.api.scala.DataStream$$anon$7.cleanFun$6 of type scala.Function1 in instance of org.apache.flink.streaming.api.scala.DataStream$$anon$7 at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2288) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290) at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:248) at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:220) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8263) Wrong packaging of flink-core in scala quickstarty
Stephan Ewen created FLINK-8263: --- Summary: Wrong packaging of flink-core in scala quickstarty Key: FLINK-8263 URL: https://issues.apache.org/jira/browse/FLINK-8263 Project: Flink Issue Type: Bug Components: Quickstarts Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Blocker Fix For: 1.5.0, 1.4.1 The scala quickstart currently does not set {{flink-core}} to "provided" in the "build-jar" profile. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8261) Typos in the shading exclusion for jsr305 in the quickstarts
Stephan Ewen created FLINK-8261: --- Summary: Typos in the shading exclusion for jsr305 in the quickstarts Key: FLINK-8261 URL: https://issues.apache.org/jira/browse/FLINK-8261 Project: Flink Issue Type: Bug Components: Quickstarts Affects Versions: 1.4.0 Reporter: Stephan Ewen Assignee: Stephan Ewen This affects both the Java and the Scala quickstarts. The typo is {{findbgs}} instead of {{findbugs}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8196) Fix Hadoop Servled Dependency Exclusion
Stephan Ewen created FLINK-8196: --- Summary: Fix Hadoop Servled Dependency Exclusion Key: FLINK-8196 URL: https://issues.apache.org/jira/browse/FLINK-8196 Project: Flink Issue Type: Bug Components: Build System Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Blocker Fix For: 1.4.0 We currently exclude the `javax.servlet` API dependency, which is unfortunately needed as a core dependency by Hadoop 2.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8125) Support limiting the number of open FileSystem connections
Stephan Ewen created FLINK-8125: --- Summary: Support limiting the number of open FileSystem connections Key: FLINK-8125 URL: https://issues.apache.org/jira/browse/FLINK-8125 Project: Flink Issue Type: Improvement Components: Core Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.1, 1.5.0 We need a way to limit the number of streams that Flink FileSystems concurrently open. For example, for very small HDFS clusters with few RPC handlers, a large Flink job trying to build up many connections during a checkpoint causes failures due to rejected connections. I propose to add a file system that can wrap another existing file system The file system may track the progress of streams and close streams that have been inactive for too long, to avoid locked streams of taking up the complete pool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7997) Avro should be always in the user code
Stephan Ewen created FLINK-7997: --- Summary: Avro should be always in the user code Key: FLINK-7997 URL: https://issues.apache.org/jira/browse/FLINK-7997 Project: Flink Issue Type: Improvement Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 Having Avro in the user code space makes it possible for users to use different Avro versions that the ones pulled in by an overloaded classpath (for example when having Hadoop in the classpath) This is possible through the new child-first classloading in Flink 1.4. Also, this should fix the problem of "X cannot be cast to X", because Avro classes will be scoped to the user code class loader, and the Avro schema cache will not be JVM-wide- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7973) Fix service shading relocation for S3 file systems
Stephan Ewen created FLINK-7973: --- Summary: Fix service shading relocation for S3 file systems Key: FLINK-7973 URL: https://issues.apache.org/jira/browse/FLINK-7973 Project: Flink Issue Type: Bug Reporter: Stephan Ewen Assignee: Stephan Ewen Priority: Blocker Fix For: 1.4.0 The shade plugin relocates services incorrectly currently, applying relocation patterns multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7972) Move SerializationSchema to flink-core
Stephan Ewen created FLINK-7972: --- Summary: Move SerializationSchema to flink-core Key: FLINK-7972 URL: https://issues.apache.org/jira/browse/FLINK-7972 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 The {{SerializationSchema}} and its related classes are currently in {{flink-streaming-java}}. API level projects that depend on those classes hence pull in a dependency on runtime classes. For example, this would be required in order to make {{flink-avro}} independent of runtime dependencies and Scala versions, same for the future for thrift format support, for Hbase connectors, etc. This should not be API breaking since we can keep the classes in the same namespace and only move them "updstream" in the dependency structure, or we can keep classes in the original namespace that extend the moved classes in {{flink-core}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7968) Deduplicate serializer classes between runtime and queryable state
Stephan Ewen created FLINK-7968: --- Summary: Deduplicate serializer classes between runtime and queryable state Key: FLINK-7968 URL: https://issues.apache.org/jira/browse/FLINK-7968 Project: Flink Issue Type: Bug Components: Queryable State Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 Some serializer classes where duplicated into {{flink-queryable-state}} to avoid a dependency on {{flink-runtime}}. The proper solution here is to move the classes to the shared {{flink-core}} project, because these classes are actually useful in a series of API utilities and they do not have any dependency on other flink classes at all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7925) Add CheckpointingOptions
Stephan Ewen created FLINK-7925: --- Summary: Add CheckpointingOptions Key: FLINK-7925 URL: https://issues.apache.org/jira/browse/FLINK-7925 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 The CheckpointingOptions should consolidate all checkpointing and state backend-related settings that were previously split across different classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7924) Fix incorrect names of checkpoint options
Stephan Ewen created FLINK-7924: --- Summary: Fix incorrect names of checkpoint options Key: FLINK-7924 URL: https://issues.apache.org/jira/browse/FLINK-7924 Project: Flink Issue Type: Sub-task Components: State Backends, Checkpointing Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 Checkpoint options are incorrectly always called 'FULL_CHECKPOINT' when actually, the checkpoints may always be incremental and only savepoints have to be full and self contained. Initially, we planned to add options for multiple checkpoints, like checkpoints that were foreced to be full, and checkpoints that were incremental. That is not necessary at this point. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7842) Shade jackson (org.codehouse.jackson) in flink-shaded-hadoop2
Stephan Ewen created FLINK-7842: --- Summary: Shade jackson (org.codehouse.jackson) in flink-shaded-hadoop2 Key: FLINK-7842 URL: https://issues.apache.org/jira/browse/FLINK-7842 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7841) Add docs for Flink's S3 support
Stephan Ewen created FLINK-7841: --- Summary: Add docs for Flink's S3 support Key: FLINK-7841 URL: https://issues.apache.org/jira/browse/FLINK-7841 Project: Flink Issue Type: Bug Components: Documentation Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7840) Shade Akka's Netty Dependency
Stephan Ewen created FLINK-7840: --- Summary: Shade Akka's Netty Dependency Key: FLINK-7840 URL: https://issues.apache.org/jira/browse/FLINK-7840 Project: Flink Issue Type: Improvement Components: Build System Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 In order to avoid clashes between different Netty versions we should shade Akka's Netty away. These dependency version clashed manifest themselves in very subtle ways, like occasional deadlocks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7768) Load File Systems via Java Service abstraction
Stephan Ewen created FLINK-7768: --- Summary: Load File Systems via Java Service abstraction Key: FLINK-7768 URL: https://issues.apache.org/jira/browse/FLINK-7768 Project: Flink Issue Type: Improvement Components: Core Reporter: Stephan Ewen Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7767) Avoid loading Hadoop conf dynamically at runtime
Stephan Ewen created FLINK-7767: --- Summary: Avoid loading Hadoop conf dynamically at runtime Key: FLINK-7767 URL: https://issues.apache.org/jira/browse/FLINK-7767 Project: Flink Issue Type: Improvement Components: Streaming Connectors Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 The bucketing sink dynamically loads the Hadoop configuration in various places. The result of that configuration is not always predictable, as it tries to automagically discover the Hadoop config files. A better approach is to rely on the Flink configuration to find the Hadoop configuration, or to directly use the Hadoop configuration used by the Hadoop file systems. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7766) Remove obsolete reflection for hflush on HDFS
Stephan Ewen created FLINK-7766: --- Summary: Remove obsolete reflection for hflush on HDFS Key: FLINK-7766 URL: https://issues.apache.org/jira/browse/FLINK-7766 Project: Flink Issue Type: Improvement Components: Streaming Connectors Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 This code originally existed for compatibility with Hadoop 1. Since Hadoop 1 support is dropped, this is no longer necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7662) Remove unnecessary packaged licenses
Stephan Ewen created FLINK-7662: --- Summary: Remove unnecessary packaged licenses Key: FLINK-7662 URL: https://issues.apache.org/jira/browse/FLINK-7662 Project: Flink Issue Type: Improvement Components: Build System Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0 With the new shading approach, we no longer shade ASM into Flink artifacts, so we do not need to package the ASM license into those artifacts any more. Instead, a shaded ASM artifact already containing a packaged license is used in the distribution build. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7420) Move all Avro code to flink-avro
Stephan Ewen created FLINK-7420: --- Summary: Move all Avro code to flink-avro Key: FLINK-7420 URL: https://issues.apache.org/jira/browse/FLINK-7420 Project: Flink Issue Type: Improvement Components: Build System Reporter: Stephan Ewen Fix For: 1.4.0 Currently, the {{flink-avro}} project is a shell with some tests and mostly duplicate and dead code. The classes that use Avro are distributed quite wildly through the code base, and introduce multiple direct dependencies on Avro in a messy way. We should move all Avro related classes to {{flink-avro}}, and give {{flink-avro}} a dependency on {{flink-core}} and {{flink-streaming-java}}. - {{AvroTypeInfo}} - {{AvroSerializer}} - {{AvroRowSerializationSchema}} - {{AvroRowDeserializationSchema}} To be able to move the the avro serialization code from {{flink-ore}} to {{flink-avro}}, we need to load the {{AvroTypeInformation}} reflectively, similar to how we load the {{WritableTypeInfo}} for Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7419) Shade jackson dependency in flink-avro
Stephan Ewen created FLINK-7419: --- Summary: Shade jackson dependency in flink-avro Key: FLINK-7419 URL: https://issues.apache.org/jira/browse/FLINK-7419 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Stephan Ewen Fix For: 1.4.0 Avro uses {{org.codehouse.jackson}} which also exists in multiple incompatible versions. We should shade it to {{org.apache.flink.shaded.avro.org.codehouse.jackson}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7417) Create flink-shaded-jackson
Stephan Ewen created FLINK-7417: --- Summary: Create flink-shaded-jackson Key: FLINK-7417 URL: https://issues.apache.org/jira/browse/FLINK-7417 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Stephan Ewen Fix For: 1.4.0 The {{com.fasterml:jackson}} library is another culprit of frequent conflicts that we need to shade away. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7418) Replace all uses of jackson with flink-shaded-jackson
Stephan Ewen created FLINK-7418: --- Summary: Replace all uses of jackson with flink-shaded-jackson Key: FLINK-7418 URL: https://issues.apache.org/jira/browse/FLINK-7418 Project: Flink Issue Type: Sub-task Components: Build System Reporter: Stephan Ewen Fix For: 1.4.0 Jackson is currently used to create JSON responses in the web UI, in the future possibly for the client REST communication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7266) Don't attempt to delete parent directory on S3
Stephan Ewen created FLINK-7266: --- Summary: Don't attempt to delete parent directory on S3 Key: FLINK-7266 URL: https://issues.apache.org/jira/browse/FLINK-7266 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.3.1 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0, 1.3.2 Currently, every attempted release of an S3 state object also checks if the "parent directory" is empty and then tries to delete it. Not only is that unnecessary on S3, but it is prohibitively expensive and for example causes S3 to throttle calls by the JobManager on checkpoint cleanup. The {{FileState}} must only attempt parent directory cleanup when operating against real file systems, not when operating against object stores. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7265) FileSystems should describe their kind and consistency level
Stephan Ewen created FLINK-7265: --- Summary: FileSystems should describe their kind and consistency level Key: FLINK-7265 URL: https://issues.apache.org/jira/browse/FLINK-7265 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.3.1 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.4.0, 1.3.2 Currently, all {{FileSystem}} types look the same to Flink. However, certain operations should only be executed on certain kinds of file systems. For example, it makes no sense to attempt to delete an empty parent directory on S3, because there are no such thinks as directories, only hierarchical naming in the keys (file names). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7263) Improve Pull Request Template
Stephan Ewen created FLINK-7263: --- Summary: Improve Pull Request Template Key: FLINK-7263 URL: https://issues.apache.org/jira/browse/FLINK-7263 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Stephan Ewen Assignee: Stephan Ewen As discussed in the mailing list, the suggestion is to update the pull request template as follows: *Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review you contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.* *Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.* ## Contribution Checklist - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue. - Name the pull request in the form "[FLINK-1234] [component] Title of the pull request", where *FLINK-1234* should be replaced by the actual issue number. Skip *component* if you are unsure about which is the best component. Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Make sure that the change passes the automated tests, i.e., `mvn clean verify` - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message (including the JIRA id) - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. **(The sections below can be removed for hotfixes of typos)** ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and to TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **(yes / no)** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **(yes / no)** - The serializers: **(yes / no / don't know)** - The runtime per-record code paths (performance sensitive): **(yes / no / don't know)** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't know)**: ## Documentation - Does this pull request introduce a new feature? **(yes / no)** - If yes, how is the feature documented? **(not applicable / docs / JavaDocs / not documented)** -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7253) Remove all 'assume Java 8' code in tests
Stephan Ewen created FLINK-7253: --- Summary: Remove all 'assume Java 8' code in tests Key: FLINK-7253 URL: https://issues.apache.org/jira/browse/FLINK-7253 Project: Flink Issue Type: Sub-task Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7251) Merge the flink-java8 project into flink-core
Stephan Ewen created FLINK-7251: --- Summary: Merge the flink-java8 project into flink-core Key: FLINK-7251 URL: https://issues.apache.org/jira/browse/FLINK-7251 Project: Flink Issue Type: Sub-task Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7252) Remove Flink Futures or back them by CompletableFutures
Stephan Ewen created FLINK-7252: --- Summary: Remove Flink Futures or back them by CompletableFutures Key: FLINK-7252 URL: https://issues.apache.org/jira/browse/FLINK-7252 Project: Flink Issue Type: Sub-task Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7249) Bump Java version in build plugind
Stephan Ewen created FLINK-7249: --- Summary: Bump Java version in build plugind Key: FLINK-7249 URL: https://issues.apache.org/jira/browse/FLINK-7249 Project: Flink Issue Type: Sub-task Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.4.14#64029)