[jira] [Created] (FLINK-10033) Let Task release reference to Invokable on shutdown

2018-08-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-10033:


 Summary: Let Task release reference to Invokable on shutdown
 Key: FLINK-10033
 URL: https://issues.apache.org/jira/browse/FLINK-10033
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 1.5.2
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.3, 1.6.0


References to Task objects may under some conditions linger longer than for the 
lifetime of the task. For example, in case of local network channels, the 
receiving task may have a reference to the object of the task that produced the 
data.

To prevent against memory leaks, the Task needs to release all references to 
its AbstractInvokable when it shuts down or cancels.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9865) flink-hadoop-compatibility should assume Hadoop as provided

2018-07-16 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9865:
---

 Summary: flink-hadoop-compatibility should assume Hadoop as 
provided
 Key: FLINK-9865
 URL: https://issues.apache.org/jira/browse/FLINK-9865
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.5.1, 1.5.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.6.0


The {{flink-hadoop-compatibility}} project as a *compile* scope dependency on 
Hadoop ({{flink-hadoop-shaded}}). Because of that, the hadoop dependencies are 
pulled into the user application.

Like in other Hadoop-dependent modules, we should assume that Hadoop is 
provided in the framework classpath already.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9809) Support setting CoLocation constraints on the DataStream Transformations

2018-07-11 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9809:
---

 Summary: Support setting CoLocation constraints on the DataStream 
Transformations
 Key: FLINK-9809
 URL: https://issues.apache.org/jira/browse/FLINK-9809
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.6.0


Flink supports co location constraints for operator placement during 
scheduling. This is used internally for iterations, for example, but is not 
exposed to users.

I propose to add a way for expert users to set these constraints. As a first 
step, I would add them to the {{StreamTransformation}}, which is not part of 
the public user-facing classes, but a more internal class in the DataStream 
API. That way we make this initially a hidden feature and can gradually expose 
it more prominently when we agree that this would be a good idea.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9776) Interrupt TaskThread only while in User/Operator code

2018-07-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9776:
---

 Summary: Interrupt TaskThread only while in User/Operator code
 Key: FLINK-9776
 URL: https://issues.apache.org/jira/browse/FLINK-9776
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.6.0


Upon cancellation, the task thread is periodically interrupted.
This helps to pull the thread out of blocking operations in the user code.

Once the thread leaves the user code, the repeated interrupts may interfere 
with the shutdown cleanup logic, causing confusing exceptions.

We should stop sending the periodic interrupts once the thread leaves the user 
code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9753) Support ORC/Parquet for StreamingFileSink

2018-07-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9753:
---

 Summary: Support ORC/Parquet for StreamingFileSink
 Key: FLINK-9753
 URL: https://issues.apache.org/jira/browse/FLINK-9753
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming Connectors
Reporter: Stephan Ewen
Assignee: Kostas Kloudas
 Fix For: 1.6.0


Formats like Parquet and ORC are great at compressing data and making it fast 
to scan/filter/project the data.
However, these formats are only efficient, if they can columnarize and compress 
a significant amount of data in their columnar format. If they compress only a 
few rows at a time, they produce many short column vecors and are thus much 
less efficient.

The Bucketing Sink has the requirement that data is persistent on the target 
FileSystem on each checkpoint.
Pushing data through a Parquet or ORC encoder and flushing on each checkpoint 
means that for frequent checkpoints, the amount of data compressed/columnarized 
in a block is small. Hence, the result is an inefficiently compressed file.

Making this efficient independently of the checkpoint interval would mean that 
the sink needs to first collect (and persist) a good amount of data and then 
push it through the Parquet/ORC writers.

I would suggest to approach this as follows:

 - When writing to the "in progress files" write the raw records 
(TypeSerializer encoding)
 - When the "in progress file" is rolled over (published), the sink pushes the 
data through the encoder.
 - This is not much work on top of the new abstraction and will result in large 
blocksand hence in efficient compression.

Alternatively, we can support directly encoding the stream to the "in progress 
files" via Parque/ORC, if users know that their combination of data rate and 
checkpoint interval will result in large enough chunks of data per checkpoint 
interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9752) Add an S3 RecoverableWriter

2018-07-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9752:
---

 Summary: Add an S3 RecoverableWriter
 Key: FLINK-9752
 URL: https://issues.apache.org/jira/browse/FLINK-9752
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming Connectors
Reporter: Stephan Ewen
Assignee: Kostas Kloudas


S3 offers persistence only when uploads are complete. That means at the end of 
simple uploads and uploads of parts of a MultiPartUpload.

We should implement a RecoverableWriter for S3 that does a MultiPartUpload with 
a Part per checkpoint.
Recovering the reader needs the MultiPartUploadID and the list of ETags of 
previous parts.

We need additional staging of data in Flink state to work around the fact that
 - Parts in a MultiPartUpload must be at least 5MB
 - Part sizes must be known up front. (Note that data can still be streamed in 
the upload)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9751) Add a RecoverableWriter to the FileSystem abstraction

2018-07-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9751:
---

 Summary: Add a RecoverableWriter to the FileSystem abstraction
 Key: FLINK-9751
 URL: https://issues.apache.org/jira/browse/FLINK-9751
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.6.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen


The core operation of the StreamingFileSink is to append result data to 
(hidden) "in progress" files and then, when the files should roll over, publish 
them as visible files. At each checkpoint, the data so far must be persistent 
in the "in progress" files. On recovery, we resume the "in progress" file at 
the exact position of the checkpoint, or publish up to the position of that 
checkpoint.

In order to support various file systems and object stores, we need an 
interface that captures these core operations and allows for different 
implementations (such as file truncate/append on posix, MultiPartUpload on S3, 
...)

Proposed interface:
{code:java}
/**
 * A handle to an in-progress stream with a defined and persistent amount of 
data.
 * The handle can be used to recover the stream and publish the result file.
 */
interface CommitRecoverable { ... }

/**
 * A handle to an in-progress stream with a defined and persistent amount of 
data.
 * The handle can be used to recover the stream and either publish the result 
file
 * or keep appending data to the stream.
 */
interface ResumeRecoverable extends CommitRecoverable { ... }

/**
 * An output stream to a file system that can be recovered at well defined 
points.
 * The stream initially writes to hidden files or temp files and only creates 
the
 * target file once it is closed and "committed".
 */
public abstract class RecoverableFsDataOutputStream extends FSDataOutputStream {

/**
 * Ensures all data so far is persistent (similar to {@link #sync()}) and 
returns
 * a handle to recover the stream at the current position.
 */
public abstract ResumeRecoverable persist() throws IOException;

/**
 * Closes the stream, ensuring persistence of all data (similar to {@link 
#sync()}).
 * This returns a Committer that can be used to publish (make visible) the 
file
 * that the stream was writing to.
 */
public abstract Committer closeForCommit() throws IOException;

/**
 * A committer can publish the file of a stream that was closed.
 * The Committer can be recovered via a {@link CommitRecoverable}.
 */
public interface Committer {

void commit() throws IOException;

CommitRecoverable getRecoverable();
}
}

/**
 * The RecoverableWriter creates and recovers RecoverableFsDataOutputStream.
 */
public interface RecoverableWriter{

RecoverableFsDataOutputStream open(Path path) throws IOException;

RecoverableFsDataOutputStream recover(ResumeRecoverable resumable) throws 
IOException;

RecoverableFsDataOutputStream.Committer recoverForCommit(CommitRecoverable 
resumable) throws IOException;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9750) Create new StreamingFileSink that works on Flink's FileSystem abstraction

2018-07-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9750:
---

 Summary: Create new StreamingFileSink that works on Flink's 
FileSystem abstraction
 Key: FLINK-9750
 URL: https://issues.apache.org/jira/browse/FLINK-9750
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming Connectors
Reporter: Stephan Ewen
Assignee: Kostas Kloudas
 Fix For: 1.6.0


Using Flink's own file system abstraction means that we can add additional 
streaming/checkpointing related behavior.

In addition, the new StreamingFileSink should only rely on internal 
checkpointed state what files are possibly in progress or need to roll over, 
never assume enumeration of files in the file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9749) Rework Bucketing Sink

2018-07-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9749:
---

 Summary: Rework Bucketing Sink
 Key: FLINK-9749
 URL: https://issues.apache.org/jira/browse/FLINK-9749
 Project: Flink
  Issue Type: New Feature
  Components: Streaming Connectors
Reporter: Stephan Ewen
Assignee: Kostas Kloudas
 Fix For: 1.6.0


The BucketingSink has a series of deficits at the moment.

Due to the long list of issues, I would suggest to add a new StreamingFileSink 
with a new and cleaner design

h3. Encoders, Parquet, ORC

 - It only efficiently supports row-wise data formats (avro, jso, sequence 
files.
 - Efforts to add (columnar) compression for blocks of data is inefficient, 
because blocks cannot span checkpoints due to persistence-on-checkpoint.
 - The encoders are part of the \{{flink-connector-filesystem project}}, rather 
than in orthogonal formats projects. This blows up the dependencies of the 
\{{flink-connector-filesystem project}} project. As an example, the rolling 
file sink has dependencies on Hadoop and Avro, which messes up dependency 
management.

h3. Use of FileSystems

 - The BucketingSink works only on Hadoop's FileSystem abstraction not support 
Flink's own FileSystem abstraction and cannot work with the packaged S3, 
maprfs, and swift file systems
 - The sink hence needs Hadoop as a dependency
 - The sink relies on "trying out" whether truncation works, which requires 
write access to the users working directory
 - The sink relies on enumerating and counting files, rather than maintaining 
its own state, making less efficient

h3. Correctness and Efficiency on S3
 - The BucketingSink relies on strong consistency in the file enumeration, 
hence may work incorrectly on S3.
 - The BucketingSink relies on persisting streams at intermediate points. This 
is not working properly on S3, hence there may be data loss on S3.

h3. .valid-length companion file
 - The valid length file makes it hard for consumers of the data and should be 
dropped



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9428) Allow operators to flush data on checkpoint pre-barrier

2018-05-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9428:
---

 Summary: Allow operators to flush data on checkpoint pre-barrier
 Key: FLINK-9428
 URL: https://issues.apache.org/jira/browse/FLINK-9428
 Project: Flink
  Issue Type: New Feature
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.6.0


Some operators maintain some small transient state that may be inefficient to 
checkpoint, especially when it would need to be checkpointed also in a 
re-scalable way.
An example are opportunistic pre-aggregation operators, which have small the 
pre-aggregation state that is frequently flushed downstream.

Rather that persisting that state in a checkpoint, it can make sense to flush 
the data downstream upon a checkpoint, to let it be part of the downstream 
operator's state.

This feature is sensitive, because flushing state has a clean implication on 
the downstream operator's checkpoint alignment. However, used with care, and 
with the new back-pressure-based checkpoint alignment, this feature can be very 
useful.

Because it is sensitive, I suggest to make this only an internal feature 
(accessible to operators) and NOT expose it in the public API at this point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9392) Add @FunctionalInterface annotations to all core functional interfaces

2018-05-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9392:
---

 Summary: Add @FunctionalInterface annotations to all core 
functional interfaces
 Key: FLINK-9392
 URL: https://issues.apache.org/jira/browse/FLINK-9392
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen


The {{@FunctionalInterface}} annotation should be added to all SAM interfaces 
in order to prevent accidentally breaking them (as non SAMs).

We had a case of that before for the {{SinkFunction}} which was compatible 
through default methods, but incompatible for users that previously 
instantiated that interface through a lambda.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9314) Enable SSL mutual authentication for Netty / TaskManagers

2018-05-07 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9314:
---

 Summary: Enable SSL mutual authentication for Netty / TaskManagers
 Key: FLINK-9314
 URL: https://issues.apache.org/jira/browse/FLINK-9314
 Project: Flink
  Issue Type: Sub-task
  Components: Security
Reporter: Stephan Ewen
Assignee: Stephan Ewen


Making sure that TaskManagers authenticate both ways (client and server) 
requires giving access to keystore and truststore on both ends, and enabling 
the client authentication flag when creating the SSL Engine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9313) Enable mutual authentication for RPC (akka)

2018-05-07 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9313:
---

 Summary: Enable mutual authentication for RPC (akka) 
 Key: FLINK-9313
 URL: https://issues.apache.org/jira/browse/FLINK-9313
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination
Reporter: Stephan Ewen
Assignee: Stephan Ewen


Trivial, just needs to add the respective line in the akka configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9310) Update default cyphersuites

2018-05-07 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9310:
---

 Summary: Update default cyphersuites
 Key: FLINK-9310
 URL: https://issues.apache.org/jira/browse/FLINK-9310
 Project: Flink
  Issue Type: Task
  Components: Security
Affects Versions: 1.4.2
Reporter: Stephan Ewen
Assignee: Stephan Ewen


The current default cipher suite {{TLS_RSA_WITH_AES_128_CBC_SHA}} is no longer 
recommended.

RFC 7525 [1] recommends to use the following cipher suites only:
* TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
* TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
* TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

[1] https://tools.ietf.org/html/rfc7525



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9292) Remove TypeInfoParser

2018-05-03 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9292:
---

 Summary: Remove TypeInfoParser
 Key: FLINK-9292
 URL: https://issues.apache.org/jira/browse/FLINK-9292
 Project: Flink
  Issue Type: Task
  Components: Core
Reporter: Stephan Ewen


The {{TypeInfoParser}} has been deprecated, in favor of the {{TypeHint}}.

Because the TypeInfoParser is also not working correctly with respect to 
classloading, we should remove it. Users still find the class, try to use it, 
and run into problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9279) PythonPlanBinderTest flakey

2018-04-30 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9279:
---

 Summary: PythonPlanBinderTest flakey
 Key: FLINK-9279
 URL: https://issues.apache.org/jira/browse/FLINK-9279
 Project: Flink
  Issue Type: Bug
  Components: Python API, Tests
Affects Versions: 1.5.0
Reporter: Stephan Ewen


The test fails while trying to create the parent directory {{/tmp/flink}}. That 
happens if a file with that name already exists.

The Python Plan binder apparently used a fix name for the temp directory, but 
should use a statistically unique random name instead.

Full test run log: https://api.travis-ci.org/v3/job/373120733/log.txt

Relevant Stack Trace
{code}
Job execution failed.
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:898)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:841)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:841)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Mkdirs failed to create /tmp/flink
at 
org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:271)
at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:121)
at 
org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
at 
org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:161)
at 
org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:202)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 59.839 sec <<< 
FAILURE! - in org.apache.flink.python.api.PythonPlanBinderTest
testJobWithoutObjectReuse(org.apache.flink.python.api.PythonPlanBinderTest)  
Time elapsed: 14.912 sec  <<< FAILURE!
java.lang.AssertionError: Error while calling the test program: Job execution 
failed.
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 

[jira] [Created] (FLINK-9277) Reduce noisiness of SlotPool logging

2018-04-30 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9277:
---

 Summary: Reduce noisiness of SlotPool logging
 Key: FLINK-9277
 URL: https://issues.apache.org/jira/browse/FLINK-9277
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.5.0
Reporter: Stephan Ewen
Assignee: Till Rohrmann


The slot pool logs a vary large amount of stack traces with meaningless 
exceptions like {code}
org.apache.flink.util.FlinkException: Release multi task slot because all 
children have been released.
{code}

This makes log parsing very hard.

For an example, see this log: 
https://gist.githubusercontent.com/GJL/3b109db48734ff40103f47d04fc54bd3/raw/e3afc0ec3f452bad681e388016bcf799bba56f10/gistfile1.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9276) Improve error message when TaskManager fails

2018-04-30 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9276:
---

 Summary: Improve error message when TaskManager fails
 Key: FLINK-9276
 URL: https://issues.apache.org/jira/browse/FLINK-9276
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.5.0
Reporter: Stephan Ewen


When a TaskManager fails, we frequently get a message

{code}
org.apache.flink.util.FlinkException: Releasing TaskManager 
container_1524853016208_0001_01_000102
{code}

This message is misleading in that it sounds like an intended operation, when 
it really is a failure of a container that the {{ResourceManager}} reports to 
the {{JobManager}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9198) Improve error messages in AbstractDeserializationSchema for type extraction

2018-04-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9198:
---

 Summary: Improve error messages in AbstractDeserializationSchema 
for type extraction
 Key: FLINK-9198
 URL: https://issues.apache.org/jira/browse/FLINK-9198
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.4.2
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


User feedback: When type extraction fails in the 
{{AbstractDeserializationSchema}}, the error message does not explain fully how 
to fix this.

I suggest to improve the error message and add some convenience constructors to 
directly pass TypeInformation when needed.

We can also simplify the class a bit, because TypeInformation needs no longer 
be dropped during serialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9197) Improve error message for TypyInformation and TypeHint with generics

2018-04-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9197:
---

 Summary: Improve error message for TypyInformation and TypeHint 
with generics
 Key: FLINK-9197
 URL: https://issues.apache.org/jira/browse/FLINK-9197
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.4.2
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


User feedback: When using a {{TypeHint}} with a generic type variable, the 
error message could be better. Similarly, when using 
{{TypeInformation.of(Tuple2.class)}}, the error message should refer the user 
to the TypeHint method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9192) Undo parameterization of StateMachine Example

2018-04-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9192:
---

 Summary: Undo parameterization of StateMachine Example
 Key: FLINK-9192
 URL: https://issues.apache.org/jira/browse/FLINK-9192
 Project: Flink
  Issue Type: Improvement
Reporter: Stephan Ewen


The example has been changed to add parametrization and a different sink.

I would vote to undo these changes, the make the example less nice and use 
non-recommended sinks:

  - For state backend, incremental checkpoints, async checkpoints, etc. having 
these settings in the example blows up the parameter list of the example and 
distracts from what the example is about.
  - If the main reason for this is an end-to-end test, then these settings 
should go into the test's Flink config.
  - The {{writeAsText}} is a sink that is not recommended to use, because it is 
not integrated with checkpoints and has no well defined semantics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9189) Add a SBT and Gradle Quickstarts

2018-04-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9189:
---

 Summary: Add a SBT and Gradle Quickstarts
 Key: FLINK-9189
 URL: https://issues.apache.org/jira/browse/FLINK-9189
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Reporter: Stephan Ewen


Having a proper project template helps a lot in getting dependencies right. For 
example, setting the core dependencies to "provided", the connector / library 
dependencies to "compile", etc.

The Maven quickstarts are in good shape by now, but I observed SBT and Gradle 
users to get this wrong quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9098) ClassLoaderITCase unstable

2018-03-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9098:
---

 Summary: ClassLoaderITCase unstable
 Key: FLINK-9098
 URL: https://issues.apache.org/jira/browse/FLINK-9098
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Stephan Ewen
 Fix For: 1.5.0


The some savepoint disposal seems to fail, after that all successive tests fail 
because there are not anymore enough slots.

Full log: https://api.travis-ci.org/v3/job/356900367/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9045) LocalEnvironment with web UI does not work with flip-6

2018-03-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9045:
---

 Summary: LocalEnvironment with web UI does not work with flip-6
 Key: FLINK-9045
 URL: https://issues.apache.org/jira/browse/FLINK-9045
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Reporter: Stephan Ewen
 Fix For: 1.5.0


The following code is supposed to start a web UI when executing in-IDE. Does 
not work with flip-6, as far as I can see.

{code}
final ExecutionEnvironment env = 
ExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9037) Test flake Kafka09ITCase#testCancelingEmptyTopic

2018-03-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9037:
---

 Summary: Test flake Kafka09ITCase#testCancelingEmptyTopic
 Key: FLINK-9037
 URL: https://issues.apache.org/jira/browse/FLINK-9037
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.5.0
Reporter: Stephan Ewen


{code}
Test 
testCancelingEmptyTopic(org.apache.flink.streaming.connectors.kafka.Kafka09ITCase)
 failed with:
org.junit.runners.model.TestTimedOutException: test timed out after 6 
milliseconds
{code}

Full log: https://api.travis-ci.org/v3/job/356044885/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9036) Add default value via suppliers

2018-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9036:
---

 Summary: Add default value via suppliers
 Key: FLINK-9036
 URL: https://issues.apache.org/jira/browse/FLINK-9036
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.6.0


Earlier versions had a default value in {{ValueState}}. We dropped that, 
because the value would have to be duplicated on each access, to be safe 
against side effects when using mutable types.

For convenience, we should re-add the feature, but using a supplier/factory 
function to create the default value on access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9035) State Descriptors have broken hashCode() and equals()

2018-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9035:
---

 Summary: State Descriptors have broken hashCode() and equals()
 Key: FLINK-9035
 URL: https://issues.apache.org/jira/browse/FLINK-9035
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.4.2, 1.5.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.6.0


The following code fails with a {{NullPointerException}}:
{code}
ValueStateDescriptor descr = new ValueStateDescriptor<>("name", 
String.class);
descr.hashCode();
{code}
The {{hashCode()}} function tries to access the {{serializer}} field, which may 
be uninitialized at that point.

The {{equals()}} method is equally broken (no pun intended):

{code}
ValueStateDescriptor a = new ValueStateDescriptor<>("name", 
String.class);
ValueStateDescriptor b = new ValueStateDescriptor<>("name", 
String.class);

a.equals(b) // exception
b.equals(a) // exception

a.initializeSerializerUnlessSet(new ExecutionConfig());

a.equals(b) // false
b.equals(a) // exception

b.initializeSerializerUnlessSet(new ExecutionConfig());

a.equals(b) // true
b.equals(a) // true
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9034) State Descriptors drop TypeInformation on serialization

2018-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-9034:
---

 Summary: State Descriptors drop TypeInformation on serialization
 Key: FLINK-9034
 URL: https://issues.apache.org/jira/browse/FLINK-9034
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.4.2, 1.5.0
Reporter: Stephan Ewen
 Fix For: 1.6.0


The following code currently causes problems

{code}
public class MyFunction extends RichMapFunction  {

private final ValueStateDescriptor descr = new 
ValueStateDescriptor<>("state name", MyType.class);

private ValueState state;

@Override
public void open() {
state = getRuntimeContext().getValueState(descr);
}
}
{code}

The problem is that the state descriptor drops the type information and creates 
a serializer before serialization as part of shipping the function in the 
cluster. To do that, it initializes the serializer with an empty execution 
config, making serialization inconsistent.

This is mainly an artifact from the days when dropping the type information 
before shipping was necessary, because the type info was not serializable. It 
now is, and we can fix that bug.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8885) The DispatcherThreadFactory should register uncaught exception handlers

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8885:
---

 Summary: The DispatcherThreadFactory should register uncaught 
exception handlers
 Key: FLINK-8885
 URL: https://issues.apache.org/jira/browse/FLINK-8885
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


The {{DispatcherThreadFactory}} is responsible for spawning the thread pool 
threads for TaskManager's async dispatcher and for the CheckpointCoordinators 
timed trigger.

In case of uncaught exceptions in these threads, the system is not healthy and 
more, hence they should register the {{FatalExitUcaughtExceptionsHandler}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8884) The DispatcherThreadFactory should register uncaught exception handlers

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8884:
---

 Summary: The DispatcherThreadFactory should register uncaught 
exception handlers
 Key: FLINK-8884
 URL: https://issues.apache.org/jira/browse/FLINK-8884
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


The {{DispatcherThreadFactory}} is responsible for spawning the thread pool 
threads for TaskManager's async dispatcher and for the CheckpointCoordinators 
timed trigger.

In case of uncaught exceptions in these threads, the system is not healthy and 
more, hence they should register the {{FatalExitUcaughtExceptionsHandler}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8883) ExceptionUtils.rethrowIfFatalError should tread ThreadDeath as fatal.

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8883:
---

 Summary: ExceptionUtils.rethrowIfFatalError should tread 
ThreadDeath as fatal.
 Key: FLINK-8883
 URL: https://issues.apache.org/jira/browse/FLINK-8883
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


Thread deaths leave code in inconsistent state and should thus always be 
forwarded as fatal exceptions that cannot be handled in any way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8879) Add concurrent access check to AvroSerializer

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8879:
---

 Summary: Add concurrent access check to AvroSerializer
 Key: FLINK-8879
 URL: https://issues.apache.org/jira/browse/FLINK-8879
 Project: Flink
  Issue Type: Sub-task
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


On debug log level and during tests, the AvroSerializer should check whether it 
is concurrently accessed, and throw an exception in that case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8878) Check for concurrent access to Kryo Serializer

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8878:
---

 Summary: Check for concurrent access to Kryo Serializer
 Key: FLINK-8878
 URL: https://issues.apache.org/jira/browse/FLINK-8878
 Project: Flink
  Issue Type: Sub-task
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


On debug log level and during tests, the {{KryoSerializer}} should check 
whether it is concurrently accessed, and throw an exception in that case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8877) Configure Kryo's log level based on Flink's log level

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8877:
---

 Summary: Configure Kryo's log level based on Flink's log level
 Key: FLINK-8877
 URL: https://issues.apache.org/jira/browse/FLINK-8877
 Project: Flink
  Issue Type: Sub-task
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


Kryo uses its embedded MinLog for logging.

When Flink is set to trace, Kryo should be set to trace as well. Other log 
levels should not be uses, as even debug logging in Kryo results in excessive 
logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8876) Improve concurrent access handling in stateful serializers

2018-03-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8876:
---

 Summary: Improve concurrent access handling in stateful serializers
 Key: FLINK-8876
 URL: https://issues.apache.org/jira/browse/FLINK-8876
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


Some stateful serializers produce incorrect results when accidentally accessed 
by multiple threads concurrently.

 To better catch these cases, I suggest to add concurrency checks that are 
active only when debug logging is enabled, and during test runs.

This is inspired by Kryo's checks for concurrent access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8856) Move all interrupt() calls to TaskCanceler

2018-03-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8856:
---

 Summary: Move all interrupt() calls to TaskCanceler
 Key: FLINK-8856
 URL: https://issues.apache.org/jira/browse/FLINK-8856
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Reporter: Stephan Ewen
 Fix For: 1.5.0


We need this to work around the following JVM bug: 
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8138622

To circumvent this problem, the {{TaskCancelerWatchDog}} must not call 
{{interrupt()}} at all, but only join on the executing thread (with timeout) 
and cause a hard exit once cancellation takes to long.

A user affected by this problem reported this in FLINK-8834

Personal note: The Thread.join(...) method unfortunately is not 100% reliable 
as well, because it uses {{System.currentTimeMillis()}} rather than 
{{System.nanoTime()}}. Because of that, sleeps can take overly long when the 
clock is adjusted. I wonder why the JDK authors do not follow their own 
recommendations and use {{System.nanoTime()}} for all relative time measures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8837) Move DataStreamUtils to package 'experimental'.

2018-03-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8837:
---

 Summary: Move DataStreamUtils to package 'experimental'.
 Key: FLINK-8837
 URL: https://issues.apache.org/jira/browse/FLINK-8837
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Stephan Ewen
 Fix For: 1.5.0


The class {{DataStreamUtils}} came from 'flink-contrib' and now accidentally 
moved to the fully supported API packages. It should be in package 
'experimental' to properly communicate that it is not guaranteed to be API 
stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8835) Fix TaskManager config keys

2018-03-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8835:
---

 Summary: Fix TaskManager config keys
 Key: FLINK-8835
 URL: https://issues.apache.org/jira/browse/FLINK-8835
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Reporter: Stephan Ewen
 Fix For: 1.5.0


Many new config keys in the TaskManager don't follow the proper naming scheme. 
We need to clear those up before the release. I would also suggest to keep the 
key names short, because that makes it easier for users.

When doing this cleanup pass over the config keys, I would suggest to also make 
some of the existing keys more hierarchical harmonize them with the common 
scheme in Flink.

## New Keys

* {{taskmanager.network.credit-based-flow-control.enabled}} to 
{{taskmanager.network.credit-model}}.

* {{taskmanager.exactly-once.blocking.data.enabled}} to 
{{task.checkpoint.alignment.blocking}} (we already have 
{{task.checkpoint.alignment.max-size}})

## Existing Keys

* {{taskmanager.debug.memory.startLogThread}} => 
{{taskmanager.debug.memory.log}}

* {{taskmanager.debug.memory.logIntervalMs}} => 
{{taskmanager.debug.memory.log-interval}}

* {{taskmanager.initial-registration-pause}} => 
{{taskmanager.registration.initial-backoff}}

* {{taskmanager.max-registration-pause}} => 
{{taskmanager.registration.max-backoff}}

* {{taskmanager.refused-registration-pause}} 
{{taskmanager.registration.refused-backoff}}

* {{taskmanager.maxRegistrationDuration}} ==> * 
{{taskmanager.registration.timeout}}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalClassName()' with 'getClassName()'

2018-03-01 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8824:
---

 Summary: In Kafka Consumers, replace 'getCanonicalClassName()' 
with 'getClassName()'
 Key: FLINK-8824
 URL: https://issues.apache.org/jira/browse/FLINK-8824
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Reporter: Stephan Ewen
 Fix For: 1.5.0


The connector uses {{getCanonicalClassName()}} in all places, gather than 
{{getClassName()}}.

{{getCanonicalClassName()}}'s intention is to normalize class names for arrays, 
etc, but is problematic when instantiating classes from class names.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8806) Failure in UnionInputGate getNextBufferOrEvent()

2018-02-28 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8806:
---

 Summary: Failure in UnionInputGate getNextBufferOrEvent()
 Key: FLINK-8806
 URL: https://issues.apache.org/jira/browse/FLINK-8806
 Project: Flink
  Issue Type: Bug
  Components: Network
Affects Versions: 1.5.0, 1.6.0
Reporter: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


Error occurs in {{SelfConnectionITCase}}:

Full log: https://api.travis-ci.org/v3/job/346847455/log.txt

Exception Stack Trace
{code}
org.apache.flink.runtime.client.JobExecutionException: 
java.lang.IllegalStateException
at 
org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:527)
at 
org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:79)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1510)
at 
org.apache.flink.test.streaming.runtime.SelfConnectionITCase.differentDataStreamDifferentChain(SelfConnectionITCase.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:108)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:78)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:54)
at 
org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:144)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: java.lang.IllegalStateException
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
at 
org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:173)
at 
org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:273)
at 
org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:307)
at 

[jira] [Created] (FLINK-8803) Mini Cluster Shutdown with HA unstable, causing test failures

2018-02-28 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8803:
---

 Summary: Mini Cluster Shutdown with HA unstable, causing test 
failures
 Key: FLINK-8803
 URL: https://issues.apache.org/jira/browse/FLINK-8803
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Stephan Ewen


When the Mini Cluster is created for HA tests with ZooKeeper, the shutdown is 
unstable.

It looks like ZooKeeper may be shut down before the JobManager is shut down, 
causing the shutdown procedure of the JobManager (specifically 
{{ZooKeeperSubmittedJobGraphStore.removeJobGraph}}) to block until tests time 
out.

Full log: https://api.travis-ci.org/v3/job/346853707/log.txt

Note that no ZK threads are alive any more, seems ZK is shut down already.

Relevant Stack Traces:

{code}

"main" #1 prio=5 os_prio=0 tid=0x7f973800a800 nid=0x43b4 waiting on 
condition [0x7f973eb0b000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x8966cf18> (a 
scala.concurrent.impl.Promise$CompletionLatch)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:212)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:169)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.startInternalShutdown(FlinkMiniCluster.scala:469)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:435)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.closeAsync(FlinkMiniCluster.scala:719)
at 
org.apache.flink.test.util.MiniClusterResource.after(MiniClusterResource.java:104)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
...
{code}

{code}
"flink-akka.actor.default-dispatcher-2" #1012 prio=5 os_prio=0 
tid=0x7f97394fa800 nid=0x3328 waiting on condition [0x7f971db29000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x87f82a70> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:336)
at 
org.apache.flink.shaded.curator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241)
at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225)
at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
at 
org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.release(ZooKeeperStateHandleStore.java:478)
at 
org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:435)
at 
org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:405)
at 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.removeJobGraph(ZooKeeperSubmittedJobGraphStore.java:266)
- locked <0x807f4258> (a java.lang.Object)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$1.apply$mcV$sp(JobManager.scala:1727)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$1.apply(JobManager.scala:1723)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$1.apply(JobManager.scala:1723)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 

[jira] [Created] (FLINK-8800) Set Logging to TRACE for org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler

2018-02-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8800:
---

 Summary: Set Logging to TRACE for 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler
 Key: FLINK-8800
 URL: https://issues.apache.org/jira/browse/FLINK-8800
 Project: Flink
  Issue Type: Bug
  Components: REST
Reporter: Stephan Ewen
 Fix For: 1.5.0, 1.6.0


When setting the log level to {{DEBUG}}, the logs are swamped with statements 
as below, making it hard to read the debug logs.

{code}
2018-02-22 13:41:04,016 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/ded95c643b42f31cf882a8986207fd30/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,048 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/eec5890dac9c38f66954443809beb5b0/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,052 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a964ee72788c82cb7d15e352d9a94f6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,079 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/1d9c83f6e1879fdbe461aafac16eb8a5/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,085 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/4063620891a151092c5bcedb218870a6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,094 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a751c66e0e32aee2cd8120a1a72a4d6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,142 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/37ecc85b429bd08d0fd539532055e117/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,173 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/20e20298680571979f690d36d1a6db36/metrics?get=0.currentLowWatermark.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8798) Make commons-logging a parent-first pattern

2018-02-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8798:
---

 Summary: Make commons-logging a parent-first pattern
 Key: FLINK-8798
 URL: https://issues.apache.org/jira/browse/FLINK-8798
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.4.1
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.4.2, 1.6.0


The Apache {{commons-logging}} framework does not play well with child-first 
classloading.

We need to make this a parent-first pattern.

As a matter of fact, other frameworks that use inverted classloading (JBoss, 
Tomcat) use force this library to be always parent-first as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8791) Fix documentation on how to link dependencies

2018-02-26 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8791:
---

 Summary: Fix documentation on how to link dependencies
 Key: FLINK-8791
 URL: https://issues.apache.org/jira/browse/FLINK-8791
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The documentation in "Linking with Flink" and "Linking with Optional 
Dependencies" is very outdated and gives wrong advise to users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8767) Set the maven.compiler.source and .target properties for Java Quickstart

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8767:
---

 Summary: Set the maven.compiler.source and .target properties for 
Java Quickstart
 Key: FLINK-8767
 URL: https://issues.apache.org/jira/browse/FLINK-8767
 Project: Flink
  Issue Type: Sub-task
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Setting these properties helps properly pinning the Java version in IntelliJ.
Without these properties, Java version keeps switching back to 1.5 in some 
setups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8765) Simplify quickstart properties

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8765:
---

 Summary: Simplify quickstart properties
 Key: FLINK-8765
 URL: https://issues.apache.org/jira/browse/FLINK-8765
 Project: Flink
  Issue Type: Sub-task
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


This does not pull out the slf4j and log4j version into properties any more, 
making the quickstarts a bit simpler.

Given that both versions are used only once, and only for the feature to have 
convenience logging in the IDE, the versions might as well be defined directly 
in the dependencies.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8766) Pin scala runtime version for Java Quickstart

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8766:
---

 Summary: Pin scala runtime version for Java Quickstart
 Key: FLINK-8766
 URL: https://issues.apache.org/jira/browse/FLINK-8766
 Project: Flink
  Issue Type: Sub-task
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


 Followup to FLINK-7414, which pinned the scala version for the Scala Quickstart



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8764) Make quickstarts work out of the box for IDE and JAR packaging

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8764:
---

 Summary: Make quickstarts work out of the box for IDE and JAR 
packaging
 Key: FLINK-8764
 URL: https://issues.apache.org/jira/browse/FLINK-8764
 Project: Flink
  Issue Type: Sub-task
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


We can make the quickstarts work for IntelliJ, Eclipse, and Maven Jar packaging 
out of the box, without the need to pass a profile name during jar packaging 
via the following trick:

  - All Flink and Scala dependencies are properly set to provided
  - That way, Maven JAR packaging behaves correctly by default
  - Eclipse adds 'provided' dependencies to the classpath when running 
programs, so works out of the box
  - There is a profile that automatically activates in IntelliJ that adds 
the necessary
dependencies in 'compile' scope to make it run out of the box.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8762) Remove unnecessary examples and make "StreamingJob" the default

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8762:
---

 Summary: Remove unnecessary examples and make "StreamingJob" the 
default
 Key: FLINK-8762
 URL: https://issues.apache.org/jira/browse/FLINK-8762
 Project: Flink
  Issue Type: Sub-task
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The packaged WordCount example jobs have been reported to not be terribly 
helpful and simply create noise in the initial project setup.

In addition, setting the main class by default to {{StreamingJob}} creates a 
better out of the box experience for the majority of the users. We prominently 
document how to adjust this to use {{BatchJob}} as the main class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8763) Remove obsolete Dummy.java classes from quickstart projects.

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8763:
---

 Summary: Remove obsolete Dummy.java classes from quickstart 
projects.
 Key: FLINK-8763
 URL: https://issues.apache.org/jira/browse/FLINK-8763
 Project: Flink
  Issue Type: Sub-task
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


These classes seem no longer necessary, the project JavaDocs build properly 
without those classes being present.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8761) Various improvements to the Quickstarts

2018-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8761:
---

 Summary: Various improvements to the Quickstarts
 Key: FLINK-8761
 URL: https://issues.apache.org/jira/browse/FLINK-8761
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Various improvements to the Quickstarts to give a smoother out of the box 
experience.

Broken down into the subtasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8738) Converge runtime dependency versions for 'scala-lang' and for 'com.typesafe:config'

2018-02-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8738:
---

 Summary: Converge runtime dependency versions for 'scala-lang' and 
for 'com.typesafe:config'
 Key: FLINK-8738
 URL: https://issues.apache.org/jira/browse/FLINK-8738
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


These dependencies are currently diverged:

{code}
Dependency convergence error for com.typesafe:config:1.3.0 paths to dependency 
are:
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-actor_2.11:2.4.20
+-com.typesafe:config:1.3.0
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-stream_2.11:2.4.20
+-com.typesafe:ssl-config-core_2.11:0.2.1
  +-com.typesafe:config:1.2.0
{code}

and

{code}
Dependency convergence error for org.scala-lang:scala-library:2.11.12 paths to 
dependency are:
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-org.scala-lang:scala-library:2.11.12
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-actor_2.11:2.4.20
+-org.scala-lang:scala-library:2.11.11
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-actor_2.11:2.4.20
+-org.scala-lang.modules:scala-java8-compat_2.11:0.7.0
  +-org.scala-lang:scala-library:2.11.7
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-stream_2.11:2.4.20
+-org.scala-lang:scala-library:2.11.11
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-stream_2.11:2.4.20
+-com.typesafe:ssl-config-core_2.11:0.2.1
  +-org.scala-lang:scala-library:2.11.8
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-stream_2.11:2.4.20
+-com.typesafe:ssl-config-core_2.11:0.2.1
  +-org.scala-lang.modules:scala-parser-combinators_2.11:1.0.4
+-org.scala-lang:scala-library:2.11.6
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-protobuf_2.11:2.4.20
+-org.scala-lang:scala-library:2.11.11
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.typesafe.akka:akka-slf4j_2.11:2.4.20
+-org.scala-lang:scala-library:2.11.11
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-org.clapper:grizzled-slf4j_2.11:1.0.2
+-org.scala-lang:scala-library:2.11.0
and
+-com.daplatform.flink:txn-api:1.0-SNAPSHOT
  +-org.apache.flink:flink-streaming-java_2.11:1.5-SNAPSHOT
+-org.apache.flink:flink-runtime_2.11:1.5-SNAPSHOT
  +-com.twitter:chill_2.11:0.7.4
+-org.scala-lang:scala-library:2.11.7
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8729) Migrate JSONGenerator from Slink to Jackson

2018-02-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8729:
---

 Summary: Migrate JSONGenerator from Slink to Jackson
 Key: FLINK-8729
 URL: https://issues.apache.org/jira/browse/FLINK-8729
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Stephan Ewen


The {{org.apache.flink.streaming.api.graph.JSONGenerator}} uses Slink for JSON 
encoding, adding an extra dependency. All other Flink parts use a specially 
shaded Jackson dependency.

Migrating the JSONGenerator would allow us to drop a dependency and make the 
code more homogeneous.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8727) Test instability in SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager

2018-02-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8727:
---

 Summary: Test instability in 
SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager
 Key: FLINK-8727
 URL: https://issues.apache.org/jira/browse/FLINK-8727
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Stephan Ewen


Travis build and logs: https://api.travis-ci.org/v3/job/344253865/log.txt

[~till.rohrmann] FYI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8696) Remove JobManager local mode from the Unix Shell Scripts

2018-02-19 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8696:
---

 Summary: Remove JobManager local mode from the Unix Shell Scripts
 Key: FLINK-8696
 URL: https://issues.apache.org/jira/browse/FLINK-8696
 Project: Flink
  Issue Type: Sub-task
  Components: Startup Shell Scripts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


In order to work towards removing the local JobManager mode, the shell scripts 
need to be changed to not use/assume that mode any more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8695) Move RocksDB State Backend from 'flink-contrib' to 'flink-state-backends'

2018-02-19 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8695:
---

 Summary: Move RocksDB State Backend from 'flink-contrib' to 
'flink-state-backends'
 Key: FLINK-8695
 URL: https://issues.apache.org/jira/browse/FLINK-8695
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Having the RocksDB State Backend in {{flink-contrib}} is a bit of an 
understatement...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8682) Make start/stop cluster scripts work without SSH for local HA setups

2018-02-16 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8682:
---

 Summary: Make start/stop cluster scripts work without SSH for 
local HA setups
 Key: FLINK-8682
 URL: https://issues.apache.org/jira/browse/FLINK-8682
 Project: Flink
  Issue Type: Improvement
  Components: Startup Shell Scripts
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The startup should work for purely local (testing) cluster without setups 
without SSH.

While the shell scripts handle this correctly for TaskManagers, they don't 
handle it correctly for JobManagers. As a consequence, {{start-cluster.sh}} 
does not work without SSH when high availability is enabled.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8681) Remove planVisualizer.html move notice

2018-02-16 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8681:
---

 Summary: Remove planVisualizer.html move notice
 Key: FLINK-8681
 URL: https://issues.apache.org/jira/browse/FLINK-8681
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The {{planVisualizer.html}} for optimizer plans is no longer in the Flink 
distribution, but we hold a notice there that the visualizer has moved to the 
website.

That notice has been there for many versions (since Flink 1.0) and can be 
removed now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8680) Name printing sinks by default.

2018-02-16 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8680:
---

 Summary: Name printing sinks by default.
 Key: FLINK-8680
 URL: https://issues.apache.org/jira/browse/FLINK-8680
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The sinks that pring to std. out and std. err show up as "Sink: Unnamed" in 
logs and the UI.

They should be named "Print to Std. Out" and "Print to Std. Err" by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8549) Move TimerServiceOptions to TaskManagerOptions

2018-02-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8549:
---

 Summary: Move TimerServiceOptions to TaskManagerOptions
 Key: FLINK-8549
 URL: https://issues.apache.org/jira/browse/FLINK-8549
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The {{TimerServiceOptions}} are in the wrong place (prohibit generation of 
config docs) and cause over-fragmentation of the options in the code base.

I propose to simple move the one option from that class to the 
{{TaskManagerOptions}}, as it relates to task execution. Other shutdown related 
options are in there already.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8548) Add Streaming State Machine Example

2018-02-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8548:
---

 Summary: Add Streaming State Machine Example
 Key: FLINK-8548
 URL: https://issues.apache.org/jira/browse/FLINK-8548
 Project: Flink
  Issue Type: Sub-task
  Components: Examples
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Add the example from 
https://github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine 
to the Flink examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8540) FileStateHandles must not attempt to delete their parent directory.

2018-01-31 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8540:
---

 Summary: FileStateHandles must not attempt to delete their parent 
directory.
 Key: FLINK-8540
 URL: https://issues.apache.org/jira/browse/FLINK-8540
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Currently, every file disposal checks if the parent directory is now empty, and 
deletes it if that is the case. That is not only inefficient, but prohibitively 
expensive on some systems, like Amazon S3.

With the resolution of [FLINK-8539], this will no longer be necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8539) Introduce "CompletedCheckpointStorageLocation" to explicitly handle disposal of checkpoint storage locations

2018-01-31 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8539:
---

 Summary: Introduce "CompletedCheckpointStorageLocation" to 
explicitly handle disposal of checkpoint storage locations
 Key: FLINK-8539
 URL: https://issues.apache.org/jira/browse/FLINK-8539
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The storage location of completed checkpoints misses a proper representation. 
Because of that, there is no place that can handle the deletion of a checkpoint 
directory, or the dropping of a checkpoint specific table.

Current workaround for file systems is, for example, that every file disposal 
checks if the parent directory is now empty, and deletes it if that is the 
case. That is not only inefficient, but prohibitively expensive on some 
systems, like Amazon S3.

Properly representing the storage location for completed checkpoints allows us 
to add a disposal call for that location.

That {{CompletedCheckpointStorageLocation}} can also be used to capture 
"external pointers", metadata, and even allow us to use custom serialization 
and deserialization of the metadata in the future, making the abstraction more 
extensible by allowing users to introduce new types of state handles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8531) Support separation of "Exclusive", "Shared" and "Task owned" state

2018-01-30 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8531:
---

 Summary: Support separation of "Exclusive", "Shared" and "Task 
owned" state
 Key: FLINK-8531
 URL: https://issues.apache.org/jira/browse/FLINK-8531
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Currently, all state created at a certain checkpoint goes into the directory 
{{chk-id}}.

With incremental checkpointing, some state is shared across checkpoint and is 
referenced by newer checkpoints. That way, old {{chk-id}} directories stay 
around, containing some shared chunks. That makes it both for users and cleanup 
hooks hard to determine when a {{chk-x}} directory could be deleted.

The same holds for state that can only every be dropped by certain operators on 
the TaskManager, never by the JobManager / CheckpointCoordinator. Examples of 
that state are write ahead logs, which need to be retained until the move to 
the target system is complete, which may in some cases be later then when the 
checkpoint that created them is disposed.

I propose to introduce different scopes for tasks:
  - **EXCLUSIVE** is for state that belongs to one checkpoint only
  - **SHARED** is for state that is possibly part of multiple checkpoints
  - **TASKOWNED** is for state that must never by dropped by the JobManager.

For file based checkpoint targets, I propose that we have the following 
directory layout:
{code}
/user-defined-checkpoint-dir
|
+ --shared/
+ --taskowned/
+ --chk-1/
+ --chk-2/
+ --chk-3/
...
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8499) Kryo must not be child-first loaded

2018-01-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8499:
---

 Summary: Kryo must not be child-first loaded
 Key: FLINK-8499
 URL: https://issues.apache.org/jira/browse/FLINK-8499
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.4.1


Kryo classes are part of Flink API and hence move between Flink's core 
(serializers) and the user-application (Avro-Kryo-utils).

Duplicating the kryo dependency through reversed classloading yields problems. 
If Kryo is in the user application jar, together with Avro, the following error 
happens:

(this seems a semi-bug in the JVM, because this should clearly be a 
{{ClassCastException}}, not such a cryptic byte code error).

{code}
java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:

org/apache/flink/formats/avro/utils/AvroKryoSerializerUtils.addAvroGenericDataArrayRegistration(Ljava/util/LinkedHashMap;)V
 @23: invokespecial
  Reason:
Type 
'org/apache/flink/api/java/typeutils/runtime/kryo/Serializers$SpecificInstanceCollectionSerializerForArrayList'
 (current frame, stack[7]) is not assignable to 
'com/esotericsoftware/kryo/Serializer'
  Current Frame:
bci: @23
flags: { }
locals: { 'org/apache/flink/formats/avro/utils/AvroKryoSerializerUtils', 
'java/util/LinkedHashMap' }
stack: { 'java/util/LinkedHashMap', 'java/lang/String', uninitialized 6, 
uninitialized 6, 'java/lang/Class', uninitialized 12, uninitialized 12, 
'org/apache/flink/api/java/typeutils/runtime/kryo/Serializers$SpecificInstanceCollectionSerializerForArrayList'
 }
  Bytecode:
0x000: 2b12 05b6 000b bb00 0c59 1205 bb00 0d59
0x010: bb00 0659 b700 0eb7 000f b700 10b6 0011
0x020: 57b1   


{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8461) Wrong logger configurations for shaded Netty

2018-01-19 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8461:
---

 Summary: Wrong logger configurations for shaded Netty
 Key: FLINK-8461
 URL: https://issues.apache.org/jira/browse/FLINK-8461
 Project: Flink
  Issue Type: Bug
  Components: Logging
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.4.1


We started shading Akka's Netty in Flink 1.4.

The logger configurations (log4j.properties, logback.xml) were not updated to 
the shaded class names.

One result of this is incorrect/misleading error logging of the Netty handlers 
during shutdown, which pollute the logs and cause Yarn end-to-end tests to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8374) Unstable Yarn tests due to Akka Shutdown Exception Logging

2018-01-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8374:
---

 Summary: Unstable Yarn tests due to Akka Shutdown Exception Logging
 Key: FLINK-8374
 URL: https://issues.apache.org/jira/browse/FLINK-8374
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Critical
 Fix For: 1.5.0


Akka may log the following in some cases during shutdown:

{{java.util.concurrent.RejectedExecutionException: Worker has already been 
shutdown}}

The Yarn tests search the logs for unexpected exceptions and fail when 
encountering that exception. We should whitelist it, as it is not a problem, 
merely an Akka shutdown artifact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8373) Inconsistencies in some FileSystem directory functions

2018-01-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8373:
---

 Summary: Inconsistencies in some FileSystem directory functions
 Key: FLINK-8373
 URL: https://issues.apache.org/jira/browse/FLINK-8373
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 1.5.0


There are some minor differences in the behaviors of some File System 
functions, like {{mkdirs()}}. On some filesystems, it tolerates existing 
directories or files in place of parent directories. Some return false in an 
error case, some throw an exception.

I encountered this during writing tests for the file basted state backends. We 
should harmonize the behavior of {{FileSystem.mkdirs()}}.

I suggest to adopt the behavior that is used by HDFS, which seems the most 
correct one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8268) Test instability for 'TwoPhaseCommitSinkFunctionTest'

2017-12-15 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8268:
---

 Summary: Test instability for 'TwoPhaseCommitSinkFunctionTest'
 Key: FLINK-8268
 URL: https://issues.apache.org/jira/browse/FLINK-8268
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.5.0
Reporter: Stephan Ewen
Priority: Critical


The following exception / failure message occurs.
{code}
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.824 sec <<< 
FAILURE! - in 
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest
testIgnoreCommitExceptionDuringRecovery(org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest)
  Time elapsed: 0.068 sec  <<< ERROR!
java.lang.Exception: Could not complete snapshot 0 for operator MockTask (1/1).
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush(BufferedWriter.java:254)
at 
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest$FileBasedSinkFunction.preCommit(TwoPhaseCommitSinkFunctionTest.java:313)
at 
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest$FileBasedSinkFunction.preCommit(TwoPhaseCommitSinkFunctionTest.java:288)
at 
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:290)
at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:357)
at 
org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshot(AbstractStreamOperatorTestHarness.java:459)
at 
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunctionTest.testIgnoreCommitExceptionDuringRecovery(TwoPhaseCommitSinkFunctionTest.java:208)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8264) Add Scala to the parent-first loading patterns

2017-12-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8264:
---

 Summary: Add Scala to the parent-first loading patterns
 Key: FLINK-8264
 URL: https://issues.apache.org/jira/browse/FLINK-8264
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0, 1.4.1


A confusing experience happens when users accidentally package the Scala 
Library into their jar file. The reversed class loading duplicates Scala's 
classes, leading to exceptions like the one below.

By adding {{scala.}} to the default 'parent-first-patterns' we can improve the 
user experience in such situations.

Exception Stack Trace:
{code}
java.lang.ClassCastException: cannot assign instance of 
org.peopleinmotion.TestFunction$$anonfun$1 to field 
org.apache.flink.streaming.api.scala.DataStream$$anon$7.cleanFun$6 of type 
scala.Function1 in instance of 
org.apache.flink.streaming.api.scala.DataStream$$anon$7
at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2288)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
at 
org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:248)
at 
org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:220)
... 6 more
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8263) Wrong packaging of flink-core in scala quickstarty

2017-12-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8263:
---

 Summary: Wrong packaging of flink-core in scala quickstarty
 Key: FLINK-8263
 URL: https://issues.apache.org/jira/browse/FLINK-8263
 Project: Flink
  Issue Type: Bug
  Components: Quickstarts
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 1.5.0, 1.4.1


The scala quickstart currently does not set {{flink-core}} to "provided" in the 
"build-jar" profile.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8261) Typos in the shading exclusion for jsr305 in the quickstarts

2017-12-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8261:
---

 Summary: Typos in the shading exclusion for jsr305 in the 
quickstarts
 Key: FLINK-8261
 URL: https://issues.apache.org/jira/browse/FLINK-8261
 Project: Flink
  Issue Type: Bug
  Components: Quickstarts
Affects Versions: 1.4.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen


This affects both the Java and the Scala quickstarts.

The typo is {{findbgs}} instead of {{findbugs}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8196) Fix Hadoop Servled Dependency Exclusion

2017-12-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8196:
---

 Summary: Fix Hadoop Servled Dependency Exclusion
 Key: FLINK-8196
 URL: https://issues.apache.org/jira/browse/FLINK-8196
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 1.4.0


We currently exclude the `javax.servlet` API dependency, which is unfortunately 
needed as a core dependency by Hadoop 2.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8125) Support limiting the number of open FileSystem connections

2017-11-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8125:
---

 Summary: Support limiting the number of open FileSystem connections
 Key: FLINK-8125
 URL: https://issues.apache.org/jira/browse/FLINK-8125
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.1, 1.5.0


We need a way to limit the number of streams that Flink FileSystems 
concurrently open.
For example, for very small HDFS clusters with few RPC handlers, a large Flink 
job trying to build up many connections during a checkpoint causes failures due 
to rejected connections. 

I propose to add a file system that can wrap another existing file system The 
file system may track the progress of streams and close streams that have been 
inactive for too long, to avoid locked streams of taking up the complete pool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7997) Avro should be always in the user code

2017-11-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7997:
---

 Summary: Avro should be always in the user code
 Key: FLINK-7997
 URL: https://issues.apache.org/jira/browse/FLINK-7997
 Project: Flink
  Issue Type: Improvement
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


Having Avro in the user code space makes it possible for users to use different 
Avro versions that the ones pulled in by an overloaded classpath (for example 
when having Hadoop in the classpath)

This is possible through the new child-first classloading in Flink 1.4.

Also, this should fix the problem of "X cannot be cast to X", because Avro 
classes will be scoped to the user code class loader, and the Avro schema cache 
will not be JVM-wide-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7973) Fix service shading relocation for S3 file systems

2017-11-03 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7973:
---

 Summary: Fix service shading relocation for S3 file systems
 Key: FLINK-7973
 URL: https://issues.apache.org/jira/browse/FLINK-7973
 Project: Flink
  Issue Type: Bug
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 1.4.0


The shade plugin relocates services incorrectly currently, applying relocation 
patterns multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7972) Move SerializationSchema to flink-core

2017-11-03 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7972:
---

 Summary: Move SerializationSchema to flink-core
 Key: FLINK-7972
 URL: https://issues.apache.org/jira/browse/FLINK-7972
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


The {{SerializationSchema}} and its related classes are currently in 
{{flink-streaming-java}}.

API level projects that depend on those classes hence pull in a dependency on 
runtime classes.

For example, this would be required in order to make {{flink-avro}} independent 
of runtime dependencies and Scala versions, same for the future for thrift 
format support, for Hbase connectors, etc.

This should not be API breaking since we can keep the classes in the same 
namespace and only move them "updstream" in the dependency structure, or we can 
keep classes in the original namespace that extend the moved classes in 
{{flink-core}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7968) Deduplicate serializer classes between runtime and queryable state

2017-11-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7968:
---

 Summary: Deduplicate serializer classes between runtime and 
queryable state
 Key: FLINK-7968
 URL: https://issues.apache.org/jira/browse/FLINK-7968
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


Some serializer classes where duplicated into {{flink-queryable-state}} to 
avoid a dependency on {{flink-runtime}}.

The proper solution here is to move the classes to the shared {{flink-core}} 
project, because these classes are actually useful in a series of API utilities 
and they do not have any dependency on other flink classes at all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7925) Add CheckpointingOptions

2017-10-25 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7925:
---

 Summary:  Add CheckpointingOptions
 Key: FLINK-7925
 URL: https://issues.apache.org/jira/browse/FLINK-7925
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


The CheckpointingOptions should consolidate all checkpointing and state 
backend-related
settings that were previously split across different classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7924) Fix incorrect names of checkpoint options

2017-10-25 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7924:
---

 Summary: Fix incorrect names of checkpoint options
 Key: FLINK-7924
 URL: https://issues.apache.org/jira/browse/FLINK-7924
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


Checkpoint options are incorrectly always called 'FULL_CHECKPOINT' when 
actually,
the checkpoints may always be incremental and only savepoints have to be full
and self contained.

Initially, we planned to add options for multiple checkpoints, like checkpoints
that were foreced to be full, and checkpoints that were incremental. That 
is not necessary at this point.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7842) Shade jackson (org.codehouse.jackson) in flink-shaded-hadoop2

2017-10-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7842:
---

 Summary: Shade jackson (org.codehouse.jackson) in 
flink-shaded-hadoop2
 Key: FLINK-7842
 URL: https://issues.apache.org/jira/browse/FLINK-7842
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7841) Add docs for Flink's S3 support

2017-10-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7841:
---

 Summary: Add docs for Flink's S3 support
 Key: FLINK-7841
 URL: https://issues.apache.org/jira/browse/FLINK-7841
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7840) Shade Akka's Netty Dependency

2017-10-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7840:
---

 Summary: Shade Akka's Netty Dependency
 Key: FLINK-7840
 URL: https://issues.apache.org/jira/browse/FLINK-7840
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


In order to avoid clashes between different Netty versions we should shade 
Akka's Netty away.

These dependency version clashed manifest themselves in very subtle ways, like 
occasional deadlocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7768) Load File Systems via Java Service abstraction

2017-10-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7768:
---

 Summary: Load File Systems via Java Service abstraction
 Key: FLINK-7768
 URL: https://issues.apache.org/jira/browse/FLINK-7768
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Stephan Ewen
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7767) Avoid loading Hadoop conf dynamically at runtime

2017-10-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7767:
---

 Summary: Avoid loading Hadoop conf dynamically at runtime
 Key: FLINK-7767
 URL: https://issues.apache.org/jira/browse/FLINK-7767
 Project: Flink
  Issue Type: Improvement
  Components: Streaming Connectors
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


The bucketing sink dynamically loads the Hadoop configuration in various places.

The result of that configuration is not always predictable, as it tries to 
automagically discover the Hadoop config files.

A better approach is to rely on the Flink configuration to find the Hadoop 
configuration, or to directly use the Hadoop configuration used by the Hadoop 
file systems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7766) Remove obsolete reflection for hflush on HDFS

2017-10-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7766:
---

 Summary: Remove obsolete reflection for hflush on HDFS
 Key: FLINK-7766
 URL: https://issues.apache.org/jira/browse/FLINK-7766
 Project: Flink
  Issue Type: Improvement
  Components: Streaming Connectors
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


This code originally existed for compatibility with Hadoop 1.

Since Hadoop 1 support is dropped, this is no longer necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7662) Remove unnecessary packaged licenses

2017-09-21 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7662:
---

 Summary: Remove unnecessary packaged licenses
 Key: FLINK-7662
 URL: https://issues.apache.org/jira/browse/FLINK-7662
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0


With the new shading approach, we no longer shade ASM into Flink artifacts, so 
we do not need to package the ASM license into those artifacts any more.

Instead, a shaded ASM artifact already containing a packaged license is used in 
the distribution build.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7419) Shade jackson dependency in flink-avro

2017-08-10 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7419:
---

 Summary: Shade jackson dependency in flink-avro
 Key: FLINK-7419
 URL: https://issues.apache.org/jira/browse/FLINK-7419
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Stephan Ewen
 Fix For: 1.4.0


Avro uses {{org.codehouse.jackson}} which also exists in multiple incompatible 
versions. We should shade it to 
{{org.apache.flink.shaded.avro.org.codehouse.jackson}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7417) Create flink-shaded-jackson

2017-08-10 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7417:
---

 Summary: Create flink-shaded-jackson
 Key: FLINK-7417
 URL: https://issues.apache.org/jira/browse/FLINK-7417
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Stephan Ewen
 Fix For: 1.4.0


The {{com.fasterml:jackson}} library is another culprit of frequent conflicts 
that we need to shade away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7418) Replace all uses of jackson with flink-shaded-jackson

2017-08-10 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7418:
---

 Summary: Replace all uses of jackson with flink-shaded-jackson
 Key: FLINK-7418
 URL: https://issues.apache.org/jira/browse/FLINK-7418
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Stephan Ewen
 Fix For: 1.4.0


Jackson is currently used to create JSON responses in the web UI, in the future 
possibly for the client REST communication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-07-25 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7266:
---

 Summary: Don't attempt to delete parent directory on S3
 Key: FLINK-7266
 URL: https://issues.apache.org/jira/browse/FLINK-7266
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.3.1
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0, 1.3.2


Currently, every attempted release of an S3 state object also checks if the 
"parent directory" is empty and then tries to delete it.

Not only is that unnecessary on S3, but it is prohibitively expensive and for 
example causes S3 to throttle calls by the JobManager on checkpoint cleanup.

The {{FileState}} must only attempt parent directory cleanup when operating 
against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7265) FileSystems should describe their kind and consistency level

2017-07-25 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7265:
---

 Summary: FileSystems should describe their kind and consistency 
level
 Key: FLINK-7265
 URL: https://issues.apache.org/jira/browse/FLINK-7265
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.3.1
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0, 1.3.2


Currently, all {{FileSystem}} types look the same to Flink.

However, certain operations should only be executed on certain kinds of file 
systems.

For example, it makes no sense to attempt to delete an empty parent directory 
on S3, because there are no such thinks as directories, only hierarchical 
naming in the keys (file names).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7263) Improve Pull Request Template

2017-07-25 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7263:
---

 Summary: Improve Pull Request Template
 Key: FLINK-7263
 URL: https://issues.apache.org/jira/browse/FLINK-7263
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Stephan Ewen
Assignee: Stephan Ewen


As discussed in the mailing list, the suggestion is to update the pull request 
template as follows:


*Thank you very much for contributing to Apache Flink - we are happy that you 
want to help us improve Flink. To help the community review you contribution in 
the best possible way, please go through the checklist below, which will get 
the contribution into a shape in which it can be best reviewed.*

*Please understand that we do not do this to make contributions to Flink a 
hassle. In order to uphold a high standard of quality for code contributions, 
while at the same time managing a large number of contributions, we need 
contributors to prepare the contributions well, and give reviewers enough 
contextual information for the review. Please also understand that 
contributions that do not follow this guide will take longer to review and thus 
typically be picked up with lower priority by the community.*

## Contribution Checklist

  - Make sure that the pull request corresponds to a [JIRA 
issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are 
made for typos in JavaDoc or documentation files, which need no JIRA issue.
  
  - Name the pull request in the form "[FLINK-1234] [component] Title of the 
pull request", where *FLINK-1234* should be replaced by the actual issue 
number. Skip *component* if you are unsure about which is the best component.
  Typo fixes that have no associated JIRA issue should be named following this 
pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] 
[javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.

  - Fill out the template below to describe the changes contributed by the pull 
request. That will give reviewers the context they need to do the review.
  
  - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` 

  - Each pull request should address only one issue, not mix up code from 
multiple issues.
  
  - Each commit in the pull request has a meaningful commit message (including 
the JIRA id)

  - Once all items of the checklist are addressed, remove the above text and 
this checklist, leaving only the filled out template below.


**(The sections below can be removed for hotfixes of typos)**

## What is the purpose of the change

*(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*


## Brief change log

*(for example:)*
  - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
  - *Deployments RPC transmits only the blob storage reference*
  - *TaskManagers retrieve the TaskInfo from the blob cache*


## Verifying this change

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe 
tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
  - *Extended integration test for recovery after master (JobManager) failure*
  - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
  - *Manually verified the change by running a 4 node cluser with 2 JobManagers 
and 4 TaskManagers, a stateful streaming program, and killing one JobManager 
and to TaskManagers during the execution, verifying that recovery happens 
correctly.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): **(yes / no)**
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **(yes / no)**
  - The serializers: **(yes / no / don't know)**
  - The runtime per-record code paths (performance sensitive): **(yes / no / 
don't know)**
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't know)**:

## Documentation

  - Does this pull request introduce a new feature? **(yes / no)**
  - If yes, how is the feature documented? **(not applicable / docs / JavaDocs 
/ not documented)**




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7253) Remove all 'assume Java 8' code in tests

2017-07-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7253:
---

 Summary: Remove all 'assume Java 8' code in tests
 Key: FLINK-7253
 URL: https://issues.apache.org/jira/browse/FLINK-7253
 Project: Flink
  Issue Type: Sub-task
Reporter: Stephan Ewen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7251) Merge the flink-java8 project into flink-core

2017-07-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7251:
---

 Summary: Merge the flink-java8 project into flink-core
 Key: FLINK-7251
 URL: https://issues.apache.org/jira/browse/FLINK-7251
 Project: Flink
  Issue Type: Sub-task
Reporter: Stephan Ewen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7252) Remove Flink Futures or back them by CompletableFutures

2017-07-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7252:
---

 Summary: Remove Flink Futures or back them by CompletableFutures
 Key: FLINK-7252
 URL: https://issues.apache.org/jira/browse/FLINK-7252
 Project: Flink
  Issue Type: Sub-task
Reporter: Stephan Ewen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7249) Bump Java version in build plugind

2017-07-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7249:
---

 Summary: Bump Java version in build plugind
 Key: FLINK-7249
 URL: https://issues.apache.org/jira/browse/FLINK-7249
 Project: Flink
  Issue Type: Sub-task
Reporter: Stephan Ewen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7250) Drop the jdk8 build profile

2017-07-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7250:
---

 Summary: Drop the jdk8 build profile
 Key: FLINK-7250
 URL: https://issues.apache.org/jira/browse/FLINK-7250
 Project: Flink
  Issue Type: Sub-task
Reporter: Stephan Ewen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7231) SlotSharingGroups are not always released in time for new restarts

2017-07-19 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7231:
---

 Summary: SlotSharingGroups are not always released in time for new 
restarts
 Key: FLINK-7231
 URL: https://issues.apache.org/jira/browse/FLINK-7231
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination
Affects Versions: 1.3.1
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.4.0, 1.3.2


In the case where there are not enough resources to schedule the streaming 
program, a race condition can lead to a sequence of the following errors:

{code}
java.lang.IllegalStateException: SlotSharingGroup cannot clear task assignment, 
group still has allocated resources.
{code}

This eventually recovers, but may involve many fast restart attempts before 
doing so.

The root cause is that slots are not cleared before the next restart attempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7216) ExecutionGraph can perform concurrent global restarts to scheduling

2017-07-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-7216:
---

 Summary: ExecutionGraph can perform concurrent global restarts to 
scheduling
 Key: FLINK-7216
 URL: https://issues.apache.org/jira/browse/FLINK-7216
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination
Affects Versions: 1.3.1, 1.2.1
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 1.4.0, 1.3.2


Because ExecutionGraph restarts happen asynchronously and possibly delayed, it 
can happen in rare corner cases that two restarts are attempted concurrently, 
in which case some structures on the Execution Graph undergo a concurrent 
access:

Sample stack trace:
{code}
WARN  org.apache.flink.runtime.executiongraph.ExecutionGraph- Failed to 
restart the job.
java.lang.IllegalStateException: SlotSharingGroup cannot clear task assignment, 
group still has allocated resources.
at 
org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup.clearTaskAssignment(SlotSharingGroup.java:78)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:535)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1151)
at 
org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestarter$1.call(ExecutionGraphRestarter.java:40)
at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

The solution is to strictly guard against "subsumed" restarts via the 
{{globalModVersion}} in a similar way as we fence local restarts against global 
restarts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


<    1   2   3   4   5   6   7   8   >