[jira] [Created] (FLINK-7361) flink-web doesn't build with ruby 2.4

2017-08-03 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7361:
--

 Summary: flink-web doesn't build with ruby 2.4
 Key: FLINK-7361
 URL: https://issues.apache.org/jira/browse/FLINK-7361
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Nico Kruber
Assignee: Nico Kruber


The dependencies pulled in by the old jekyll version do not build with ruby 2.4 
and fail with something like

{code}
yajl_ext.c:881:22: error: 'rb_cFixnum' undeclared (first use in this function); 
did you mean 'rb_isalnum'?
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7355) YARNSessionFIFOITCase#testfullAlloc does not run anymore

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7355:
--

 Summary: YARNSessionFIFOITCase#testfullAlloc does not run anymore
 Key: FLINK-7355
 URL: https://issues.apache.org/jira/browse/FLINK-7355
 Project: Flink
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 1.4.0, 1.3.2
Reporter: Nico Kruber
Priority: Minor


{{YARNSessionFIFOITCase#testfullAlloc}} is a test case that is ignored because 
of a too high resource consumption but if run manually, it fails with

{code}
Error while deploying YARN cluster: Couldn't deploy Yarn session cluster
java.lang.RuntimeException: Couldn't deploy Yarn session cluster
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:367)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:663)
at org.apache.flink.yarn.YarnTestBase$Runner.run(YarnTestBase.java:680)
Caused by: java.lang.IllegalArgumentException: The configuration value 
'containerized.heap-cutoff-min' is higher (600) than the requested amount of 
memory 256
at org.apache.flink.yarn.Utils.calculateHeapSize(Utils.java:101)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.setupApplicationMasterContainer(AbstractYarnClusterDescriptor.java:1356)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:840)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:456)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:362)
... 2 more
{code}

in current master and

{code}
Error while starting the YARN Client: The JobManager memory (256) is below the 
minimum required memory amount of 768 MB
java.lang.IllegalArgumentException: The JobManager memory (256) is below the 
minimum required memory amount of 768 MB
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.setJobManagerMemory(AbstractYarnClusterDescriptor.java:187)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:314)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:622)
at org.apache.flink.yarn.YarnTestBase$Runner.run(YarnTestBase.java:645)
{code}

in Flink 1.3.2 RC2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7354) test instability in LocalFlinkMiniClusterITCase#testLocalFlinkMiniClusterWithMultipleTaskManagers

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7354:
--

 Summary: test instability in 
LocalFlinkMiniClusterITCase#testLocalFlinkMiniClusterWithMultipleTaskManagers
 Key: FLINK-7354
 URL: https://issues.apache.org/jira/browse/FLINK-7354
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1, 1.1.4, 1.4.0, 1.3.2
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Critical


During {{mvn clean install}} on the 1.3.2 RC2, I found an inconsistently 
failing test at 
{{LocalFlinkMiniClusterITCase#testLocalFlinkMiniClusterWithMultipleTaskManagers}}:

{code}
testLocalFlinkMiniClusterWithMultipleTaskManagers(org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase)
  Time elapsed: 34.978 sec  <<< FAILURE!
java.lang.AssertionError: Thread Thread[initialSeedUniquifierGenerator,5,main] 
was started by the mini cluster, but not shut down
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase.testLocalFlinkMiniClusterWithMultipleTaskManagers(LocalFlinkMiniClusterITCase.java:168)
{code}

Searching the web for that error yields one previous thread on the dev-list, so 
this seems to be valid for quite old versions of flink, too, but apparently, 
was never solved:
https://lists.apache.org/thread.html/07ce439bf6d358bd3139541b52ef6b8e8af249a27e09ae10b6698f81@%3Cdev.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7352) ExecutionGraphRestartTest timeouts

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7352:
--

 Summary: ExecutionGraphRestartTest timeouts
 Key: FLINK-7352
 URL: https://issues.apache.org/jira/browse/FLINK-7352
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination, Tests
Affects Versions: 1.4.0, 1.3.2
Reporter: Nico Kruber
Priority: Critical


Recently, I received timeouts from some tests in {{ExecutionGraphRestartTest}} 
like this

{code}
Tests in error: 
  ExecutionGraphRestartTest.testConcurrentLocalFailAndRestart:638 » Timeout
{code}

This particular instance is from 1.3.2 RC2 and stuck in 
{{ExecutionGraphTestUtils#waitUntilDeployedAndSwitchToRunning()}} but I also 
had instances stuck in {{ExecutionGraphTestUtils#waitUntilJobStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7351) test instability in JobClientActorRecoveryITCase#testJobClientRecovery

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7351:
--

 Summary: test instability in 
JobClientActorRecoveryITCase#testJobClientRecovery
 Key: FLINK-7351
 URL: https://issues.apache.org/jira/browse/FLINK-7351
 Project: Flink
  Issue Type: Bug
  Components: Job-Submission, Tests
Affects Versions: 1.3.2
Reporter: Nico Kruber
Priority: Minor


On a 16-core VM, the following test failed during {{mvn clean verify}}

{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec <<< 
FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase
testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase)
  Time elapsed: 21.299 sec  <<< ERROR!
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not 
enough free slots available to run the job. You can decrease the operator 
parallelism or increase the number of slots per TaskManager in the 
configuration. Resources available to scheduler: Number of instances=0, total 
number of slots=0, available slots=0
at 
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at 
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at 
org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at 
org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7346) EventTimeWindowCheckpointingITCase.testPreAggregatedSlidingTimeWindow killed by maven watchdog on Travis

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7346:
--

 Summary: 
EventTimeWindowCheckpointingITCase.testPreAggregatedSlidingTimeWindow killed by 
maven watchdog on Travis
 Key: FLINK-7346
 URL: https://issues.apache.org/jira/browse/FLINK-7346
 Project: Flink
  Issue Type: Bug
  Components: DataStream API, Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Several test runs fail with the watchdog killing the tests after receiving no 
output for 300s showing 
{{EventTimeWindowCheckpointingITCase.testPreAggregatedSlidingTimeWindow}} as 
one of the tests, sometimes with another test running in parallel. It does not 
seem to be hanging though and the only reason for this behaviour may be that 
the tests, especially with RocksDB, take very long with no output by each of 
the test methods in {{AbstractEventTimeWindowCheckpointingITCase}}. Thus adding 
output per method should fix the spurious test failures.

Some failing instances:
https://travis-ci.org/apache/flink/jobs/259460738
https://travis-ci.org/apache/flink/jobs/259748656



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7316) always use off-heap network buffers

2017-07-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7316:
--

 Summary: always use off-heap network buffers
 Key: FLINK-7316
 URL: https://issues.apache.org/jira/browse/FLINK-7316
 Project: Flink
  Issue Type: Sub-task
  Components: Core, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


In order to send flink buffers through netty into the network, we need to make 
the buffers use off-heap memory. Otherwise, there will be a hidden copy 
happening in the NIO stack.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7315) use flink's buffers in netty

2017-07-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7315:
--

 Summary: use flink's buffers in netty
 Key: FLINK-7315
 URL: https://issues.apache.org/jira/browse/FLINK-7315
 Project: Flink
  Issue Type: Improvement
  Components: Core, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The goal of this change is to avoid the step in the channel encoder and decoder 
pipelines where flink buffers are copied into netty buffers. Instead, netty 
should directly send flink buffers to the network.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7312) activate checkstyle for flink/core/memory/*

2017-07-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7312:
--

 Summary: activate checkstyle for flink/core/memory/*
 Key: FLINK-7312
 URL: https://issues.apache.org/jira/browse/FLINK-7312
 Project: Flink
  Issue Type: Improvement
  Components: Checkstyle, Core
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7311) refrain from using fail(Exception#getMessage()) in core memory tests

2017-07-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7311:
--

 Summary: refrain from using fail(Exception#getMessage()) in core 
memory tests
 Key: FLINK-7311
 URL: https://issues.apache.org/jira/browse/FLINK-7311
 Project: Flink
  Issue Type: Improvement
  Components: Core, Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


Most unit tests in the {{flink/core/memory/}} domain still relies on a code 
pattern like this:
{code}
try {
// ...
} catch (Exception e) {
e.printStackTrace();
fail(e.getMessage());
}
{code}

This does hides the exception details and should be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7310) always use HybridMemorySegment

2017-07-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7310:
--

 Summary: always use HybridMemorySegment
 Key: FLINK-7310
 URL: https://issues.apache.org/jira/browse/FLINK-7310
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


For future changes to the network buffers (sending our own off-heap buffers 
through to netty), we cannot use {{HeapMemorySegment}} anymore and need to rely 
on {{HybridMemorySegment}} instead.

We should thus drop any code that loads the {{HeapMemorySegment}} (it is still 
available if needed) in favour of the {{HybridMemorySegment}} which is able to 
work on both heap and off-heap memory.

FYI: For the performance penalty of this change compared to using 
{{HeapMemorySegment}} alone, see this interesting blob article (from 2015):
https://flink.apache.org/news/2015/09/16/off-heap-memory.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7287) test instability in Kafka010ITCase.testCommitOffsetsToKafka

2017-07-27 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7287:
--

 Summary: test instability in 
Kafka010ITCase.testCommitOffsetsToKafka
 Key: FLINK-7287
 URL: https://issues.apache.org/jira/browse/FLINK-7287
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector, Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


sporadically, {{Kafka010ITCase.testCommitOffsetsToKafka}} seems to be failing, 
e.g. 

{code}

Test 
testCommitOffsetsToKafka(org.apache.flink.streaming.connectors.kafka.Kafka010ITCase)
 is running.

12:29:31,597 INFO  org.apache.flink.streaming.connectors.kafka.KafkaTestBase
 - 
===
== Writing sequence of 50 into testCommitOffsetsToKafkaTopic with p=3
===
12:29:31,597 INFO  org.apache.flink.streaming.connectors.kafka.KafkaTestBase
 - Writing attempt #1
12:29:31,598 INFO  
org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
Creating topic testCommitOffsetsToKafkaTopic-1
12:29:31,598 INFO  org.I0Itec.zkclient.ZkEventThread
 - Starting ZkClient event thread.
12:29:31,599 INFO  org.I0Itec.zkclient.ZkClient 
 - Waiting for keeper state SyncConnected
12:29:31,601 INFO  org.I0Itec.zkclient.ZkClient 
 - zookeeper state changed (SyncConnected)
12:29:31,615 INFO  org.I0Itec.zkclient.ZkEventThread
 - Terminate ZkClient event thread.
12:29:31,719 INFO  org.I0Itec.zkclient.ZkEventThread
 - Starting ZkClient event thread.
12:29:31,722 INFO  org.I0Itec.zkclient.ZkClient 
 - Waiting for keeper state SyncConnected
12:29:31,728 INFO  org.I0Itec.zkclient.ZkClient 
 - zookeeper state changed (SyncConnected)
12:29:31,729 INFO  org.I0Itec.zkclient.ZkEventThread
 - Terminate ZkClient event thread.
12:29:31,832 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Starting 
FlinkKafkaProducer (3/3) to produce into default topic 
testCommitOffsetsToKafkaTopic-1
12:29:31,840 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Starting 
FlinkKafkaProducer (2/3) to produce into default topic 
testCommitOffsetsToKafkaTopic-1
12:29:31,842 WARN  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Flushing 
on checkpoint is enabled, but checkpointing is not enabled. Disabling flushing.
12:29:31,844 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Starting 
FlinkKafkaProducer (1/3) to produce into default topic 
testCommitOffsetsToKafkaTopic-1
12:29:31,844 WARN  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Flushing 
on checkpoint is enabled, but checkpointing is not enabled. Disabling flushing.
12:29:31,846 WARN  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Flushing 
on checkpoint is enabled, but checkpointing is not enabled. Disabling flushing.
12:29:31,998 INFO  org.apache.flink.streaming.connectors.kafka.KafkaTestBase
 - Finished writing sequence
12:29:31,998 INFO  org.apache.flink.streaming.connectors.kafka.KafkaTestBase
 - Validating sequence
12:29:32,123 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - No 
restore state for FlinkKafkaConsumer.
12:29:32,129 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - No 
restore state for FlinkKafkaConsumer.
12:29:32,136 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - No 
restore state for FlinkKafkaConsumer.
12:29:32,139 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Consumer 
subtask 0 will start reading the following 1 partitions from the committed 
group offsets in Kafka: 
[KafkaTopicPartition{topic='testCommitOffsetsToKafkaTopic-1', partition=1}]
12:29:32,154 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Consumer 
subtask 2 will start reading the following 1 partitions from the committed 
group offsets in Kafka: 
[KafkaTopicPartition{topic='testCommitOffsetsToKafkaTopic-1', partition=0}]
12:29:32,236 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Consumer 
subtask 1 will start reading the following 1 partitions from the committed 
group offsets in Kafka: 
[KafkaTopicPartition{topic='testCommitOffsetsToKafkaTopic-1', partition=2}]
12:29:32,496 INFO  
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - No 
restore state for FlinkKafkaConsumer.
12:29:32,507 INFO  

[jira] [Created] (FLINK-7285) allow submission of big job graphs

2017-07-27 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7285:
--

 Summary: allow submission of big job graphs
 Key: FLINK-7285
 URL: https://issues.apache.org/jira/browse/FLINK-7285
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination, Job-Submission
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


In order to support submission of big job graphs, i.e. from user programs with 
a lot of code that does not fit into {{akka.framesize}}, we could try to store 
the {{JobGraph}} in the {{BlobServer}}.

Note that in the communication between the job and task managers, we also need 
the offloading of the {{TaskDeploymentDescriptor}} (FLINK-6046) to complete 
this feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7283) PythonPlanBinderTest issues with python paths

2017-07-27 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7283:
--

 Summary: PythonPlanBinderTest issues with python paths
 Key: FLINK-7283
 URL: https://issues.apache.org/jira/browse/FLINK-7283
 Project: Flink
  Issue Type: Bug
  Components: Python API, Tests
Affects Versions: 1.3.1, 1.3.0, 1.4.0, 1.3.2
Reporter: Nico Kruber
Assignee: Nico Kruber


There are some issues with {{PythonPlanBinderTest}} and the Python2/3 tests:
- the path is not set correctly (only inside {{config}}, not the 
{{configuration}} that is passed on to the {{PythonPlanBinder}}
- linux distributions have become quite inventive regarding python binary 
names: some offer {{python}} as Python 2, some as Python 3. Similarly, 
{{python3}} and/or {{python2}} may not be available. If we really want to test 
both, we need to take this into account.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7262) remove unused FallbackLibraryCacheManager

2017-07-25 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7262:
--

 Summary: remove unused FallbackLibraryCacheManager
 Key: FLINK-7262
 URL: https://issues.apache.org/jira/browse/FLINK-7262
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


{{FallbackLibraryCacheManager}} is basically only used in unit tests nowadays 
and should probably be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7261) avoid unnecessary exceptions in the logs in non-HA cases

2017-07-25 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7261:
--

 Summary: avoid unnecessary exceptions in the logs in non-HA cases
 Key: FLINK-7261
 URL: https://issues.apache.org/jira/browse/FLINK-7261
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


{{PermanentBlobCache#getHAFileInternal}} first tries to download files from the 
HA store but if it does not exist, it will log an exception from the attempt to 
move the {{incomingFile}} to its destination which is misleading to the user.

We should extend {{BlobView#get}} to return whether a file was actually copied 
or not, e.g. in the {{VoidBlobStore}} to keep the abstraction of the BLOB 
stores but to not report errors in expected cases (recall that 
{{FileSystemBlobStore#get}} will already throw an exception if anything failed 
in there and if successful but the succeeding move fails the exception from the 
move should still be prevailed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7212) JobManagerLeaderSessionIDITSuite not executed

2017-07-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7212:
--

 Summary: JobManagerLeaderSessionIDITSuite not executed
 Key: FLINK-7212
 URL: https://issues.apache.org/jira/browse/FLINK-7212
 Project: Flink
  Issue Type: Bug
  Components: JobManager, Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


{{JobManagerLeaderSessionIDITSuite}} is currently not executed due to its 
naming scheme. Only {{*ITCase}} and {{*Test}} classes are run, except for 
inside {{flink-ml}} which adds more patters to the {{scalatest}} plugin.

Also, {{JobManagerLeaderSessionIDITSuite}} needs to be adapted slightly so that 
it runs successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7196) add a TTL to transient BLOB files

2017-07-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7196:
--

 Summary: add a TTL to transient BLOB files
 Key: FLINK-7196
 URL: https://issues.apache.org/jira/browse/FLINK-7196
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Transient BLOB files are not automatically cleaned up unless the 
{{BlobCache}}/{{BlobServer}} are shut down or the files are deleted via the 
{{delete}} methods. Additionally, they should have a default time-to-live (TTL) 
so that they may be cleaned up in failure cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7166) generated avro sources not cleaned up or re-created after changes

2017-07-12 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7166:
--

 Summary: generated avro sources not cleaned up or re-created after 
changes
 Key: FLINK-7166
 URL: https://issues.apache.org/jira/browse/FLINK-7166
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Since the AVRO upgrade to 1.8.2, I could compile the flink-avro module any more 
with a failure like this in {{mvn clean install -DskipTests -pl 
flink-connectors/flink-avro}}:
{code}
Compilation failure
[ERROR] 
flink-connectors/flink-avro/src/test/java/org/apache/flink/api/io/avro/generated/Fixed16.java:[10,8]
 org.apache.flink.api.io.avro.generated.Fixed16 is not abstract and does not 
override abstract method readExternal(java.io.ObjectInput) in 
org.apache.avro.specific.SpecificFixed
{code}
This was caused by maven both not cleaning up the generated sources and also 
not overwriting them with new ones itself. Only a manual {{rm -rf 
flink-connectors/flink-avro/src/test/java/org/apache/flink/api/io/avro/generated}}
 solved the issue.

The cause for this, though, is that the avro files are generated under the 
{{src}} directory, not {{target/generated-test-sources}} as they should be. 
Either the generated sources should be cleaned up as well, or the generated 
files should be moved to this directory which is a more invasive change due to 
some hacks with respect to these files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7141) enable travis cache again

2017-07-10 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7141:
--

 Summary: enable travis cache again
 Key: FLINK-7141
 URL: https://issues.apache.org/jira/browse/FLINK-7141
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Travis
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


In the past, we had some troubles with the travis cache but in general it may 
be a good idea to include it again to speed up build times by reducing the time 
the maven downloads take.

This time, we should also deal with corrupt files in the maven repository and 
[tune 
travis|https://docs.travis-ci.com/user/caching/#Caches-and-build-matrices] so 
that it does not create corrupt caches in the first place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7140) include a UUID/random name into the BlobKey

2017-07-10 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7140:
--

 Summary: include a UUID/random name into the BlobKey
 Key: FLINK-7140
 URL: https://issues.apache.org/jira/browse/FLINK-7140
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Currently, at the {{BlobServer}}, the id of a BLOB is only based on its 
contents. This may cause issues during cleanup of transient BLOBs with the same 
content (we don't do ref-counting on a per-blob basis anymore) or at the rare 
occasion of a hash collision.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7115) test instability in MiniClusterITCase.runJobWithMultipleJobManagers

2017-07-06 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7115:
--

 Summary: test instability in 
MiniClusterITCase.runJobWithMultipleJobManagers
 Key: FLINK-7115
 URL: https://issues.apache.org/jira/browse/FLINK-7115
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber


In a test run with unrelated changes, I to have one case of 
{{MiniClusterITCase}} hanging:

https://s3.amazonaws.com/archive.travis-ci.org/jobs/250775808/log.txt?X-Amz-Expires=30=20170706T151556Z=AWS4-HMAC-SHA256=AKIAJRYRXRSVGNKPKO5A/20170706/us-east-1/s3/aws4_request=host=5b7c512137c7cbd82dcb77a98aeadc3d761b7055bea6d8f07ad6b786dc196f37

{code}
==
Maven produced no output for 300 seconds.
==
==
The following Java processes are running (JPS)
==
12852 Jps
9166 surefirebooter1705381973603203163.jar
4966 Launcher
==
Printing stack trace of Java process 12865
==
12865: No such process
==
Printing stack trace of Java process 9166
==
2017-07-06 15:05:52
Full thread dump OpenJDK 64-Bit Server VM (24.131-b00 mixed mode):

"Attach Listener" daemon prio=10 tid=0x7fc520b1a000 nid=0x3266 waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

"flink-akka.actor.default-dispatcher-9" daemon prio=10 tid=0x7fc520b0e800 
nid=0x23fd waiting on condition [0x7fc51abcb000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007a0ca2c78> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"flink-akka.actor.default-dispatcher-8" daemon prio=10 tid=0x7fc520bb9800 
nid=0x23fc waiting on condition [0x7fc51aaca000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007a0ca2c78> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"Flink-MetricRegistry-1" prio=10 tid=0x7fc520ba7800 nid=0x23f9 waiting on 
condition [0x7fc51a4c4000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007a09699c8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1092)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

"flink-akka.actor.default-dispatcher-7" daemon prio=10 tid=0x7fc520b9d800 
nid=0x23f7 waiting on condition [0x7fc51a6c6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007a0ca2c78> (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"flink-akka.actor.default-dispatcher-6" daemon prio=10 tid=0x7fc520aab000 
nid=0x23ef waiting for monitor entry [0x7fc51b0d7000]
   java.lang.Thread.State: BLOCKED (on object monitor)
   

[jira] [Created] (FLINK-7102) improve ClassLoaderITCase

2017-07-04 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7102:
--

 Summary: improve ClassLoaderITCase
 Key: FLINK-7102
 URL: https://issues.apache.org/jira/browse/FLINK-7102
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


{{ClassLoaderITCase}}...
* unnecessarily runs multiple tests in a single test case
* {{#testDisposeSavepointWithCustomKvState()}} does not cancel its job (thus 
the order of execution of test cases defines the outcome)
* uses {{e.getCause().getCause()}} which may cause {{NullPointerException}}s 
hiding the original error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7068) change BlobService sub-classes for permanent and transient BLOBs

2017-07-03 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7068:
--

 Summary: change BlobService sub-classes for permanent and 
transient BLOBs
 Key: FLINK-7068
 URL: https://issues.apache.org/jira/browse/FLINK-7068
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


A {{PermanentBlobStore}} should resemble use cases for BLOBs that are 
permanently stored for a job's life time (HA and non-HA).

A {{TransientBlobStore}} should reflect BLOB offloading for logs, RPC, etc. 
which even does not have to be reflected by files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7064) test instability in WordCountMapreduceITCase

2017-07-03 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7064:
--

 Summary: test instability in WordCountMapreduceITCase
 Key: FLINK-7064
 URL: https://issues.apache.org/jira/browse/FLINK-7064
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber


Although already mentioned in FLINK-7004, this does not seem fixed yet and 
apparently now became an instable test:

{code}
Running org.apache.flink.test.hadoop.mapred.WordCountMapredITCase
Inflater has been closed
java.lang.NullPointerException: Inflater has been closed
at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
at java.util.zip.Inflater.inflate(Inflater.java:257)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at 
javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:319)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
at 
javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:968)
at 
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010)
at org.apache.hadoop.mapred.JobConf.(JobConf.java:423)
at 
org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:63)
at 
org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:79)
at 
org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:67)
at 
org.apache.flink.test.util.JavaProgramTestBase.testJobWithObjectReuse(JavaProgramTestBase.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 

[jira] [Created] (FLINK-7063) test instability in OperatorStateBackendTest.testSnapshotAsyncCancel

2017-07-03 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7063:
--

 Summary: test instability in 
OperatorStateBackendTest.testSnapshotAsyncCancel
 Key: FLINK-7063
 URL: https://issues.apache.org/jira/browse/FLINK-7063
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing, Tests
Affects Versions: 1.3.1, 1.4.0
Reporter: Nico Kruber
Assignee: Stefan Richter
Priority: Minor


{{OperatorStateBackendTest.testSnapshotAsyncCancel}} seems to be instable and 
sometimes fails:

{code}
testSnapshotAsyncCancel(org.apache.flink.runtime.state.OperatorStateBackendTest)
  Time elapsed: 0.036 sec  <<< ERROR!
java.util.concurrent.ExecutionException: java.io.IOException: Stream closed.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at 
org.apache.flink.runtime.state.OperatorStateBackendTest.testSnapshotAsyncCancel(OperatorStateBackendTest.java:636)
Caused by: java.io.IOException: Stream closed.
at 
org.apache.flink.runtime.util.BlockerCheckpointStreamFactory$1.write(BlockerCheckpointStreamFactory.java:95)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
at 
org.apache.flink.core.io.VersionedIOReadableWritable.write(VersionedIOReadableWritable.java:40)
at 
org.apache.flink.runtime.state.OperatorBackendSerializationProxy.write(OperatorBackendSerializationProxy.java:65)
at 
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:255)
at 
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:233)
at 
org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

logs:
* 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/248822546/log.txt?X-Amz-Expires=30=20170703T092940Z=AWS4-HMAC-SHA256=AKIAJRYRXRSVGNKPKO5A/20170703/us-east-1/s3/aws4_request=host=f468cd238236d7038a1e12086dd4a0e3ba538d93c883790d180e4c63b973a5f2
* https://transfer.sh/MHawk/17392.5.tar.gz



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7057) move BLOB ref-counting from LibraryCacheManager to BlobCache

2017-06-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7057:
--

 Summary: move BLOB ref-counting from LibraryCacheManager to 
BlobCache
 Key: FLINK-7057
 URL: https://issues.apache.org/jira/browse/FLINK-7057
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Currently, the {{LibraryCacheManager}} is doing some ref-counting for JAR files 
managed by it. Instead, we want the {{BlobCache}} to do that itself for all 
job-related BLOBs. Also, we do not want to operate on a per-{{BlobKey}} level 
but rather per job. Therefore, the cleanup process should be adapted, too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7056) add API to allow job-related BLOBs to be stored

2017-06-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7056:
--

 Summary: add API to allow job-related BLOBs to be stored
 Key: FLINK-7056
 URL: https://issues.apache.org/jira/browse/FLINK-7056
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


To ease cleanup, we will make job-related BLOBs be reflected in the blob 
storage so that they may be removed along with the job. This adds the jobId to 
many methods similar to the previous code from the {{NAME_ADDRESSABLE}} mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7055) refactor BlobService#getURL() methods to return a File object

2017-06-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7055:
--

 Summary: refactor BlobService#getURL() methods to return a File 
object
 Key: FLINK-7055
 URL: https://issues.apache.org/jira/browse/FLINK-7055
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


As a relic from its use by the {{UrlClassLoader}}, {{BlobService#getURL()}} 
methods always returned {{URL}} objects although they were always pointing to 
locally cached files. As a step towards a better architecture and API, these 
should return a File object instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7054) remove LibraryCacheManager#getFile()

2017-06-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7054:
--

 Summary: remove LibraryCacheManager#getFile()
 Key: FLINK-7054
 URL: https://issues.apache.org/jira/browse/FLINK-7054
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


{{LibraryCacheManager#getFile()}} was only used in tests where it is avoidable 
but if used anywhere else, it may have caused cleanup issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7053) improve code quality in some tests

2017-06-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7053:
--

 Summary: improve code quality in some tests
 Key: FLINK-7053
 URL: https://issues.apache.org/jira/browse/FLINK-7053
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination, Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


* {{BlobClientTest}} and {{BlobClientSslTest}} share a lot of common code
* the received buffers there are currently not verified for being equal to the 
expected one
* {{TemporarFolder}} should be used throughout blob store tests



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7052) remove NAME_ADDRESSABLE mode

2017-06-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7052:
--

 Summary: remove NAME_ADDRESSABLE mode
 Key: FLINK-7052
 URL: https://issues.apache.org/jira/browse/FLINK-7052
 Project: Flink
  Issue Type: Sub-task
  Components: Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Remove the BLOB store's {{NAME_ADDRESSABLE}} mode as it is currently not used 
and partly broken.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7012) remove user-JAR upload when disposing a savepoint the old way

2017-06-27 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7012:
--

 Summary: remove user-JAR upload when disposing a savepoint the old 
way
 Key: FLINK-7012
 URL: https://issues.apache.org/jira/browse/FLINK-7012
 Project: Flink
  Issue Type: Bug
  Components: Client, State Backends, Checkpointing
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


Inside {{CliFrontend#disposeSavepoint()}}, user JAR files are being uploaded to 
the {{BlobServer}} but they are actually not used (also also not cleaned up) in 
the job manager's handling of the {{DisposeSavepoint}} message.

Since removing new savepoints is as simple as deleting files and old savepoints 
have always worked without these user JARs, we should remove the upload to be 
able to make the JAR file upload jobId-dependent for FLINK-6916.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-6916) FLIP-19: Improved BLOB storage architecture

2017-06-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6916:
--

 Summary: FLIP-19: Improved BLOB storage architecture
 Key: FLINK-6916
 URL: https://issues.apache.org/jira/browse/FLINK-6916
 Project: Flink
  Issue Type: Improvement
  Components: Network
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The current architecture around the BLOB server and cache components seems 
rather patched up and has some issues regarding concurrency ([FLINK-6380]), 
cleanup, API inconsistencies / currently unused API ([FLINK-6329], 
[FLINK-6008]). These make future integration with FLIP-6 or extensions like 
offloading oversized RPC messages ([FLINK-6046]) difficult. We therefore 
propose an improvement on the current architecture as described below which 
tackles these issues, provides some cleanup, and enables further BLOB server 
use cases.

Please refer to 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture
 for a full overview on the proposed changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-6791) Using MemoryStateBackend as checkpoint stream back-end may block checkpoint/savepoint creation

2017-06-01 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6791:
--

 Summary: Using MemoryStateBackend as checkpoint stream back-end 
may block checkpoint/savepoint creation
 Key: FLINK-6791
 URL: https://issues.apache.org/jira/browse/FLINK-6791
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.2.1, 1.3.0
Reporter: Nico Kruber


If the `MemoryStateBackend` is used as the checkpoint stream back-end in e.g. 
RocksDBStateBackend, it will block further checkpoint/savepoint creation if the 
checkpoint data reaches the back-end's max state size. In that case, an error 
message is logged at the task manager but the save-/checkpoint never completes 
and although the job continues, no further checkpoints will be made.

Please see the following example that should be reproducible:

{code:java}
env.setStateBackend(new RocksDBStateBackend(new MemoryStateBackend(1000 * 1024 
* 1024, false), false));

env.enableCheckpointing(100L);

final long numKeys = 100_000L;
DataStreamSource source1 =
env.addSource(new RichParallelSourceFunction() {
private volatile boolean running = true;

@Override
public void run(SourceContext ctx) throws 
Exception {
long counter = 0;

while (running) {
synchronized (ctx.getCheckpointLock()) {
ctx.collect(Tuple1.of(counter % 
numKeys));
counter++;
}

Thread.yield();
}
}

@Override
public void cancel() {
running = false;
}
});

source1.keyBy(0)
.map(new RichMapFunction() {
private transient ValueState val;

@Override
public Tuple1 map(Tuple1 value)
throws Exception {
val.update(Collections.nCopies(100, value.f0));
return value;
}

@Override
public void open(final Configuration parameters) throws 
Exception {
ValueStateDescriptor descriptor =
new ValueStateDescriptor<>(
"data", // the 
state name

TypeInformation.of(new TypeHint() {
}) // type 
information
);
val = getRuntimeContext().getState(descriptor);
}
}).uid("identity-map-with-state")
.addSink(new DiscardingSink());

env.execute("failingsnapshots");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6784) Add some notes about externalized checkpoints and the difference to savepoints

2017-05-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6784:
--

 Summary: Add some notes about externalized checkpoints and the 
difference to savepoints
 Key: FLINK-6784
 URL: https://issues.apache.org/jira/browse/FLINK-6784
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, State Backends, Checkpointing
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber


while externalized checkpoints are described somehow, there does not seem to be 
any paragraph explaining the difference to savepoints, also there are two 
checkpointing docs which could at least be linked somehow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6782) Update savepoint documentation

2017-05-31 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6782:
--

 Summary: Update savepoint documentation
 Key: FLINK-6782
 URL: https://issues.apache.org/jira/browse/FLINK-6782
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, State Backends, Checkpointing
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Savepoint documentation is a bit outdated regarding full data being stored in 
the savepoint path, not just a metadata file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6774) build-helper-maven-plugin version not set

2017-05-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6774:
--

 Summary: build-helper-maven-plugin version not set
 Key: FLINK-6774
 URL: https://issues.apache.org/jira/browse/FLINK-6774
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


some modules forgot to specify the version of their 
{{build-helper-maven-plugin}} which causes the following warning in a maven 
build:

{code}
[WARNING] 'build.plugins.plugin.version' for 
org.codehaus.mojo:build-helper-maven-plugin is missing. @ 
org.apache.flink:flink-connector-kafka-base_${scala.binary.version}:[unknown-version],
 /home/nico/Projects/flink/flink-connectors/flink-connector-kafka-base/pom.xml, 
line 216, column 12
[WARNING] It is highly recommended to fix these problems because they threaten 
the stability of your build.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6706) remove ChaosMonkeyITCase

2017-05-24 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6706:
--

 Summary: remove ChaosMonkeyITCase
 Key: FLINK-6706
 URL: https://issues.apache.org/jira/browse/FLINK-6706
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber


{{ChaosMonkeyITCase}} has been disabled since Dec 2015 and is probably 
outdated. In its current form, it doesn't make sense to keep it and would need 
a replacement if desired.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6702) SIGABRT after CEPOperatorTest#testCEPOperatorSerializationWRocksDB() during GC

2017-05-24 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6702:
--

 Summary: SIGABRT after 
CEPOperatorTest#testCEPOperatorSerializationWRocksDB() during GC
 Key: FLINK-6702
 URL: https://issues.apache.org/jira/browse/FLINK-6702
 Project: Flink
  Issue Type: Bug
  Components: CEP, Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber


During the CEP unit tests, when garbage collection kicks in and tries to 
finalize RocksDB, it may fail with
{code}
pure virtual method called
terminate called without an active exception
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
{code}

Reason is a missing {{harness.close()}} call in 
{{CEPOperatorTest#testCEPOperatorSerializationWRocksDB()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6689) Remote StreamExecutionEnvironment fails to submit jobs against LocalFlinkMiniCluster

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6689:
--

 Summary: Remote StreamExecutionEnvironment fails to submit jobs 
against LocalFlinkMiniCluster
 Key: FLINK-6689
 URL: https://issues.apache.org/jira/browse/FLINK-6689
 Project: Flink
  Issue Type: Bug
  Components: Job-Submission
Affects Versions: 1.3.0
Reporter: Nico Kruber
 Fix For: 1.3.0


The following Flink programs fails to execute with the current 1.3 branch (1.2 
works):

{code:java}
final String jobManagerAddress = "localhost";
final int jobManagerPort = ConfigConstants.DEFAULT_JOB_MANAGER_IPC_PORT;

final Configuration config = new Configuration();
config.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, 
jobManagerAddress);
config.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, 
jobManagerPort);
config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);

final LocalFlinkMiniCluster cluster = new LocalFlinkMiniCluster(config, false);
cluster.start(true);

final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment(jobManagerAddress, 
jobManagerPort);

env.fromElements(1l).addSink(new DiscardingSink());

// fails due to leader session id being wrong:
env.execute("test");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6680) App & Flink migration guide: updates for the 1.3 release

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6680:
--

 Summary: App & Flink migration guide: updates for the 1.3 release
 Key: FLINK-6680
 URL: https://issues.apache.org/jira/browse/FLINK-6680
 Project: Flink
  Issue Type: Sub-task
Reporter: Nico Kruber


The "Upgrading Applications and Flink Versions" at {{docs/ops/upgrading.md}} 
does not contain any info on Flink 1.3 yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6678) Migration guide: add note about removed log4j default logger from core artefacts

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6678:
--

 Summary: Migration guide: add note about removed log4j default 
logger from core artefacts
 Key: FLINK-6678
 URL: https://issues.apache.org/jira/browse/FLINK-6678
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Documentation
Affects Versions: 1.3.0
Reporter: Nico Kruber
 Fix For: 1.3.0


The migration guide at {{docs/dev/migration.md}} needs to be extended with some 
notes about the removed specific logger dependencies in the Flink core 
artefacts (FLINK-6415).

This is valid for applications embedding flink. Examples and quickstarts have 
been adding their loggers already but other projects may need to add those. In 
maven's {{pom.xml}}, for example, by adding the following dependencies

{code:xml}

org.slf4j
slf4j-log4j12
1.7.7



log4j
log4j
1.2.17

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6677) Add Table API changes to the migration guide

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6677:
--

 Summary: Add Table API changes to the migration guide
 Key: FLINK-6677
 URL: https://issues.apache.org/jira/browse/FLINK-6677
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table API & SQL
Affects Versions: 1.3.0
Reporter: Nico Kruber
 Fix For: 1.3.0


The migration guide at {{docs/dev/migration.md}} needs to be extended with some 
notes about the API changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6676) Add QueryableStateClient changed to the migration guide

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6676:
--

 Summary: Add QueryableStateClient changed to the migration guide
 Key: FLINK-6676
 URL: https://issues.apache.org/jira/browse/FLINK-6676
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Queryable State
Affects Versions: 1.3.0
Reporter: Nico Kruber
 Fix For: 1.3.0


The migration guide at {{docs/dev/migration.md}} needs to be extended with some 
notes about the API changes:
* changes in the constructor
* more?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6674) Update release 1.3 docs

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6674:
--

 Summary: Update release 1.3 docs
 Key: FLINK-6674
 URL: https://issues.apache.org/jira/browse/FLINK-6674
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.3.0
Reporter: Nico Kruber
 Fix For: 1.3.0


Umbrella issue to track required updates to the documentation for the 1.3 
release.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6670) remove CommonTestUtils.createTempDirectory()

2017-05-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6670:
--

 Summary: remove CommonTestUtils.createTempDirectory()
 Key: FLINK-6670
 URL: https://issues.apache.org/jira/browse/FLINK-6670
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Nico Kruber
Priority: Minor


{{CommonTestUtils.createTempDirectory()}} encourages a dangerous design pattern 
with potential concurrency issues in the unit tests which could be solved by 
using the following pattern instead.
{code:java}
@Rule
public TemporaryFolder tempFolder = new TemporaryFolder();
{code}

We should therefore remove {{CommonTestUtils.createTempDirectory()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6659) RocksDBMergeIteratorTest leaving temporary directories behind

2017-05-22 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6659:
--

 Summary: RocksDBMergeIteratorTest leaving temporary directories 
behind
 Key: FLINK-6659
 URL: https://issues.apache.org/jira/browse/FLINK-6659
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1, 1.2.0, 1.3.0, 1.2.2
Reporter: Nico Kruber
Assignee: Nico Kruber


{{RocksDBMergeIteratorTest}} uses a newly created temporary directory for its 
RocksDB instance but does not delete is when finished. We should better replace 
this pattern with a proper {{@Rule}}-based approach



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6654) missing maven dependency on "flink-shaded-hadoop2-uber" in flink-dist

2017-05-22 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6654:
--

 Summary: missing maven dependency on "flink-shaded-hadoop2-uber" 
in flink-dist
 Key: FLINK-6654
 URL: https://issues.apache.org/jira/browse/FLINK-6654
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber
 Fix For: 1.3.0


Since applying FLINK-6514, flink-dist includes 
{{flink-shaded-hadoop2-uber-*.jar}} but without giving this dependency in its 
{{pom.xml}}. This may lead to concurrency issues during builds but also fails 
building the flink-dist module only (with dependencies) as in

{code}
mvn clean install -pl flink-dist -am
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6380) BlobService concurrency issues between delete and put/get methods

2017-04-25 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6380:
--

 Summary: BlobService concurrency issues between delete and put/get 
methods
 Key: FLINK-6380
 URL: https://issues.apache.org/jira/browse/FLINK-6380
 Project: Flink
  Issue Type: Bug
  Components: Network
Affects Versions: 1.3.0
Reporter: Nico Kruber


{{BlobCache#deleteAll(JobID)}} deletes the job directory which is only created 
at the start of {{BlobCache#getURL(BlobKey)}} which then relies on the 
directory being present.

This is not restricted to the {{BlobCache}}, though, but also affects the 
{{BlobServer}} in two ways:
1) its own local storage and
2) its backing {{BlobStore}}

For the latter, i.e. in {{FileSystemBlobStore}}, there is no guarantee that a 
directory will not be deleted concurrently (from a {{delete}} method) between 
its creation and writing a file (in a {{get}} method):

* the {{delete}} method for name-addressable blobs always deletes the 
job-specific storage directory if there is no further blob for this job
* the content-addressable blobs do that similarly but are shared among jobs and 
thus only delete directories if there is no other blob.

Since name-addressable blobs have not been used so far and the latter case 
typically does not occur concurrently with get/put requests, this has not been 
a problem so far but is more relevant after applying FLINK-6046.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6329) refactor name-addressable accessors in BlobService classes from returning a URL to a File or InputStream

2017-04-19 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6329:
--

 Summary: refactor name-addressable accessors in BlobService 
classes from returning a URL to a File or InputStream
 Key: FLINK-6329
 URL: https://issues.apache.org/jira/browse/FLINK-6329
 Project: Flink
  Issue Type: Improvement
  Components: Network
Reporter: Nico Kruber
Priority: Minor


The fact that the BLOB-by-hash, i.e. {{CONTENT_ADDRESSABLE}}, methods are typed 
to {{URL}} is an artefact of it being used for libraries with a 
{{URLClassLoader}}.

For the {{NAME_ADDRESSABLE}} blobs, we could type all the methods like 
{{get(JobId, String)}} to {{File}} or to {{InputStream}}, rather then to 
{{URL}}. That means we should also change the method names to {{getBlob}} or 
something like that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6320) Flakey JobManagerHAJobGraphRecoveryITCase

2017-04-18 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6320:
--

 Summary: Flakey JobManagerHAJobGraphRecoveryITCase
 Key: FLINK-6320
 URL: https://issues.apache.org/jira/browse/FLINK-6320
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber


it looks as if there is a race condition in the cleanup of 
{{JobManagerHAJobGraphRecoveryITCase}}.

{code}
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 50.271 sec <<< 
FAILURE! - in org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase
testJobPersistencyWhenJobManagerShutdown(org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase)
  Time elapsed: 0.129 sec  <<< ERROR!
java.io.FileNotFoundException: File does not exist: 
/tmp/9b63934b-789d-428c-aa9e-47d5d8fa1e32/recovery/submittedJobGraphf763d61fba47
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
at 
org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase.cleanUp(JobManagerHAJobGraphRecoveryITCase.java:112)
{code}
Full log: https://s3.amazonaws.com/archive.travis-ci.org/jobs/223124016/log.txt

Maybe a rule-based temporary directory is a better solution:
{code:java}
@Rule
public TemporaryFolder tempFolder = new TemporaryFolder();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6305) flink-ml tests are executed in all flink-fast-test profiles

2017-04-13 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6305:
--

 Summary: flink-ml tests are executed in all flink-fast-test 
profiles
 Key: FLINK-6305
 URL: https://issues.apache.org/jira/browse/FLINK-6305
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber
Priority: Minor


The {{flink-fast-tests-\*}} profiles partition the unit tests based on their 
starting letter. However, this does not affect the Scala tests run via the 
ScalaTest plugin and therefore, {{flink-ml}} tests are executed in all three 
currently existing profiles. While this may not be that grave, it does run for 
about 2.5 minutes on Travis CI which may be saved in 2/3 of the profiles there.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6299) make all IT cases extend from TestLogger

2017-04-12 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6299:
--

 Summary: make all IT cases extend from TestLogger
 Key: FLINK-6299
 URL: https://issues.apache.org/jira/browse/FLINK-6299
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


Not all of the integration tests extend from {{TestLogger}} but this is a very 
helpful tool so the currently running tests are written to the logs as well as 
their failures, especially for those tests where errors are often burried in 
the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6293) Flakey JobManagerITCase

2017-04-11 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6293:
--

 Summary: Flakey JobManagerITCase
 Key: FLINK-6293
 URL: https://issues.apache.org/jira/browse/FLINK-6293
 Project: Flink
  Issue Type: Bug
  Components: Job-Submission, Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber


Quite seldomly, {{JobManagerITCase}} seems to hang, e.g. see 
https://api.travis-ci.org/jobs/220888193/log.txt?deansi=true 

The maven watchdog kills the build due to not output being produced within 300s 
and {{JobManagerITCase}} seems to hang in line 772, i.e.
{code:title=JobManagerITCase lines 
770-772|language=java|linenumbers=true|firstline=770}
// Trigger savepoint for non-existing job
jobManager.tell(TriggerSavepoint(jobId, Option.apply("any")), testActor)
val response = expectMsgType[TriggerSavepointFailure](deadline.timeLeft)
{code}

Although the (downloaded) logs do not quite allow a precise mapping to this 
test case, it looks as if the following block may be related:

{code}
09:34:47,684 INFO  org.apache.flink.runtime.minicluster.FlinkMiniCluster
 - Akka ask timeout set to 100s
09:34:47,777 INFO  org.apache.flink.runtime.minicluster.FlinkMiniCluster
 - Disabled queryable state server
09:34:47,777 INFO  org.apache.flink.runtime.minicluster.FlinkMiniCluster
 - Starting FlinkMiniCluster.
09:34:47,809 INFO  akka.event.slf4j.Slf4jLogger 
 - Slf4jLogger started
09:34:47,837 INFO  org.apache.flink.runtime.blob.BlobServer 
 - Created BLOB server storage directory 
/tmp/blobStore-eab23d04-ea18-4dc5-b1df-fcf9fc295062
09:34:47,838 WARN  org.apache.flink.runtime.net.SSLUtils
 - Not a SSL socket, will skip setting tls version and cipher suites.
09:34:47,839 INFO  org.apache.flink.runtime.blob.BlobServer 
 - Started BLOB server at 0.0.0.0:36745 - max concurrent requests: 50 - max 
backlog: 1000
09:34:47,840 INFO  org.apache.flink.runtime.metrics.MetricRegistry  
 - No metrics reporter configured, no metrics will be exposed/reported.
09:34:47,850 INFO  org.apache.flink.runtime.testingUtils.TestingMemoryArchivist 
 - Started memory archivist akka://flink/user/archive_1
09:34:47,860 INFO  org.apache.flink.runtime.testutils.TestingResourceManager
 - Trying to associate with JobManager leader akka://flink/user/jobmanager_1
09:34:47,861 INFO  org.apache.flink.runtime.testingUtils.TestingJobManager  
 - Starting JobManager at akka://flink/user/jobmanager_1.
09:34:47,862 WARN  org.apache.flink.runtime.testingUtils.TestingJobManager  
 - Discard message 
LeaderSessionMessage(----,TriggerSavepoint(6e813070338a23b0ff571646bca56521,Some(any)))
 because there is currently no valid leader id known.
09:34:47,862 INFO  org.apache.flink.runtime.testingUtils.TestingJobManager  
 - JobManager akka://flink/user/jobmanager_1 was granted leadership with leader 
session ID Some(----).
09:34:47,867 INFO  org.apache.flink.runtime.testutils.TestingResourceManager
 - Resource Manager associating with leading JobManager 
Actor[akka://flink/user/jobmanager_1#-652927556] - leader session 
----
{code}

If so, then this may be related to FLINK-6287 and may possibly even be a 
duplicate.

What is strange though is that the timeout for the expected message to arrive 
is no more than 2m and thus the test should properly fail within 300s.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6292) Travis: transfer.sh not accepting uploads via http:// anymore

2017-04-11 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6292:
--

 Summary: Travis: transfer.sh not accepting uploads via http:// 
anymore
 Key: FLINK-6292
 URL: https://issues.apache.org/jira/browse/FLINK-6292
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0, 1.3.0, 1.1.5
Reporter: Nico Kruber
Assignee: Nico Kruber


The {{travis_mvn_watchdog.sh}} script tries to upload the logs to transfer.sh 
but it seems like they do not accept uploads to {{http://transfer.sh}} anymore 
and only accept {{https}} nowadays.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6287) Flakey JobManagerRegistrationTest

2017-04-10 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6287:
--

 Summary: Flakey JobManagerRegistrationTest
 Key: FLINK-6287
 URL: https://issues.apache.org/jira/browse/FLINK-6287
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 1.3.0
 Environment: unit tests
Reporter: Nico Kruber


There seems to be a race condition in the "{{JobManagerRegistrationTest.The 
JobManager should handle repeated registration calls}}" (scala) unit test.
Every so often, especially when my system is under load, this test fails with a 
timeout after seeing the following messages in the log4j INFO outputs:

{code}
14:18:42,257 INFO org.apache.flink.runtime.testutils.TestingResourceManager - 
Trying to associate with JobManager leader akka://flink/user/$f#-1062324203
14:18:42,253 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting 
JobManager at akka://flink/user/$f.
14:18:42,258 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard 
message 
LeaderSessionMessage(----,RegisterTaskManager(8b6ad3eccdbfcb199df630b794b6ec0c,8b6ad3eccdbfcb199df630b794b6ec0c
 @ nico-work.fritz.box (dataPort=1),cores=4, physMem=16686931968, 
heap=1556938752, managed=10,1)) because there is currently no valid leader id 
known.
14:18:42,259 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard 
message 
LeaderSessionMessage(----,RegisterTaskManager(8b6ad3eccdbfcb199df630b794b6ec0c,8b6ad3eccdbfcb199df630b794b6ec0c
 @ nico-work.fritz.box (dataPort=1),cores=4, physMem=16686931968, 
heap=1556938752, managed=10,1)) because there is currently no valid leader id 
known.
14:18:42,259 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard 
message 
LeaderSessionMessage(----,RegisterTaskManager(8b6ad3eccdbfcb199df630b794b6ec0c,8b6ad3eccdbfcb199df630b794b6ec0c
 @ nico-work.fritz.box (dataPort=1),cores=4, physMem=16686931968, 
heap=1556938752, managed=10,1)) because there is currently no valid leader id 
known.
14:18:42,259 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard 
message 
LeaderSessionMessage(----,RegisterResourceManager
 akka://flink/user/resourcemanager-61e00f37-9e99-4355-9099-53b992e8232e) 
because there is currently no valid leader id known.
14:18:42,259 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager 
akka://flink/user/$f was granted leadership with leader session ID 
Some(----).
{code}

Full log:
{code}
 14:18:42,230 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB 
server storage directory /tmp/blobStore-d09b13ee-26bb-4ec8-950a-956f4a6c16cf
14:18:42,231 WARN org.apache.flink.runtime.net.SSLUtils - Not a SSL socket, 
will skip setting tls version and cipher suites.
14:18:42,236 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB 
server at 0.0.0.0:39695 - max concurrent requests: 50 - max backlog: 1000
14:18:42,247 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics 
reporter configured, no metrics will be exposed/reported.
14:18:42,249 WARN org.apache.flink.runtime.metrics.MetricRegistry - Could not 
start MetricDumpActor. No metrics will be submitted to the WebInterface.
akka.actor.InvalidActorNameException: actor name [MetricQueryService] is not 
unique!
at 
akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130)
at akka.actor.dungeon.Children$class.reserveChild(Children.scala:76)
at akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
at akka.actor.dungeon.Children$class.makeChild(Children.scala:201)
at akka.actor.dungeon.Children$class.attachChild(Children.scala:41)
at akka.actor.ActorCell.attachChild(ActorCell.scala:369)
at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:553)
at 
org.apache.flink.runtime.metrics.dump.MetricQueryService.startMetricQueryService(MetricQueryService.java:170)
at 
org.apache.flink.runtime.metrics.MetricRegistry.startQueryService(MetricRegistry.java:166)
at 
org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(JobManager.scala:2720)
at 
org.apache.flink.runtime.jobmanager.JobManagerRegistrationTest.org$apache$flink$runtime$jobmanager$JobManagerRegistrationTest$$startTestingJobManager(JobManagerRegistrationTest.scala:182)
at 
org.apache.flink.runtime.jobmanager.JobManagerRegistrationTest$$anonfun$1$$anonfun$apply$mcV$sp$4.apply$mcV$sp(JobManagerRegistrationTest.scala:123)
at 
org.apache.flink.runtime.jobmanager.JobManagerRegistrationTest$$anonfun$1$$anonfun$apply$mcV$sp$4.apply(JobManagerRegistrationTest.scala:121)
at 
org.apache.flink.runtime.jobmanager.JobManagerRegistrationTest$$anonfun$1$$anonfun$apply$mcV$sp$4.apply(JobManagerRegistrationTest.scala:121)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at 

[jira] [Created] (FLINK-6270) Port several network config parameters to ConfigOption

2017-04-06 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6270:
--

 Summary: Port several network config parameters to ConfigOption
 Key: FLINK-6270
 URL: https://issues.apache.org/jira/browse/FLINK-6270
 Project: Flink
  Issue Type: Improvement
  Components: Network
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


I'd like to port some memory and network buffers related config options to  new 
{{ConfigOption}} instances before continuing with FLINK-4545. These include:

* {{taskmanager.memory.size}}
* {{taskmanager.memory.fraction}}
* {{taskmanager.memory.off-heap}}
* {{taskmanager.memory.preallocate}}
* {{taskmanager.network.numberOfBuffers}}
* {{taskmanager.memory.segment-size}}

Some of these already existed as {{ConfigOption}} instances in 
{{MiniClusterConfiguration}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6064) wrong BlobServer hostname in TaskExecutor

2017-03-15 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6064:
--

 Summary: wrong BlobServer hostname in TaskExecutor
 Key: FLINK-6064
 URL: https://issues.apache.org/jira/browse/FLINK-6064
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The hostname currently used for the connection to the blob server inside 
TaskExecutor is the akka RPC address which is invalid here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6046) Add support for oversized messages during deployment

2017-03-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6046:
--

 Summary: Add support for oversized messages during deployment
 Key: FLINK-6046
 URL: https://issues.apache.org/jira/browse/FLINK-6046
 Project: Flink
  Issue Type: New Feature
  Components: Distributed Coordination
Reporter: Nico Kruber
Assignee: Nico Kruber


This is the non-FLIP6 version of FLINK-4346, restricted to deployment messages:

Currently, messages larger than the maximum Akka Framesize cause an error when 
being transported. We should add a way to pass messages that are larger than 
{{akka.framesize}} as may happen for task deployments via the 
{{TaskDeploymentDescriptor}}.

We should use the {{BlobServer}} to offload big data items (if possible) and 
make use of any potential distributed file system behind. This way, not only do 
we avoid the akka framesize restriction, but may also be able to speed up 
deployment.

I suggest the following changes:
  - the sender, i.e. the {{Execution}} class, tries to store the serialized job 
information and serialized task information (if oversized) from the 
{{TaskDeploymentDescriptor}} (tdd) on the {{BlobServer}} as a single 
{{NAME_ADDRESSABLE}} blob under its job ID (if this does not work, we send the 
whole tdd as usual via akka)
  - if stored in a blob, these data items are removed from the tdd
  - the receiver, i.e. the {{TaskManager}} class, tries to retrieve any 
offloaded data after receiving the {{TaskDeploymentDescriptor}} from akka; it 
re-assembles the original tdd
  - as all {{NAME_ADDRESSABLE}} blobs, these offloaded blobs are removed when 
the job enters a final state

Further (future) changes may include:
  - separating the serialized job information and serialized task information 
into two files and re-use the first one for all tasks
  - not re-deploying these two during job recovery (if possible)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6008) collection of BlobServer improvements

2017-03-09 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6008:
--

 Summary: collection of BlobServer improvements
 Key: FLINK-6008
 URL: https://issues.apache.org/jira/browse/FLINK-6008
 Project: Flink
  Issue Type: Improvement
  Components: Network
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The following things should be removed around the BlobServer/BlobCache:
* update config uptions with non-deprecated ones, e.g. 
{{high-availability.cluster-id}} and {{high-availability.storageDir}}
* promote {{BlobStore#deleteAll(JobID)}} to the {{BlobService}}
* extend the {{BlobService}} to work with {{NAME_ADDRESSABLE}} blobs (prepares 
FLINK-4399]
* remove {{NAME_ADDRESSABLE}} blobs after job/task termination
* do not fail the {{BlobServer}} when a delete operation fails
* code style, like using {{Preconditions.checkArgument}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6005) unit test ArrayList initializations without default size

2017-03-09 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-6005:
--

 Summary: unit test ArrayList initializations without default size
 Key: FLINK-6005
 URL: https://issues.apache.org/jira/browse/FLINK-6005
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.3.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


I found some ArrayList initializations without a default size although it is 
possible to select one. The following PR will show some cases that I'd like to 
fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5973) check whether the direct memory size is always correctly calculated

2017-03-06 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5973:
--

 Summary: check whether the direct memory size is always correctly 
calculated
 Key: FLINK-5973
 URL: https://issues.apache.org/jira/browse/FLINK-5973
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Reporter: Nico Kruber
Priority: Minor


A user on the mailing list ran into the problem that the {{directMemorySize}} 
was incorrectly set too high which may happen if the following code path gets 
{{maxMemory}} from 1/4*> instead of the calculation, 
{{taskmanager.sh}} is doing (in his case via the discouraged {{start-local.sh}} 
script).

It be the case that other code paths also exhibit this issue, which should be 
checked.

{code:title=TaskManagerServices#createMemoryManager()}
} else if (memType == MemoryType.OFF_HEAP) {
// The maximum heap memory has been adjusted according to the fraction
long maxMemory = EnvironmentInformation.getMaxJvmHeapMemory();
long directMemorySize = (long) (maxMemory / (1.0 - memoryFraction) * 
memoryFraction);
if (preAllocateMemory) {
LOG.info("Using {} of the maximum memory size for managed 
off-heap memory ({} MB)." ,
memoryFraction, directMemorySize >> 20);
} else {
LOG.info("Limiting managed memory to {} of the maximum memory 
size ({} MB)," +
" memory will be allocated lazily.", memoryFraction, 
directMemorySize >> 20);
}
memorySize = directMemorySize;
} else {
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5814) flink-dist creates wrong symlink when not used with cleaned before

2017-02-16 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5814:
--

 Summary: flink-dist creates wrong symlink when not used with 
cleaned before
 Key: FLINK-5814
 URL: https://issues.apache.org/jira/browse/FLINK-5814
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


If {{/build-target}} already exists, 'mvn package' for flink-dist  
will create a symbolic link *inside* {{/build-target}} instead of 
replacing that symlink. This is due to the behaviour of {{ln \-sf}} for target 
links that point to directories and may be solved by adding the 
{{--no-dereference}} parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5651) state backends: forbid Iterator#remove() from the Iterable returned by HeapListState#get()

2017-01-26 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5651:
--

 Summary: state backends: forbid Iterator#remove() from the 
Iterable returned by HeapListState#get()
 Key: FLINK-5651
 URL: https://issues.apache.org/jira/browse/FLINK-5651
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The idiom behind AppendingState#get() is to return a copy of the value behind 
or at least not to allow changes to the underlying state storage. However, the 
heap state backend returns the original list and thus is prone to changes.

The returned Iterable offers Iterable#iterator() which in turn offers 
Iterator#remove() that would change the stored state. While we cannot 
efficiently block the user from modifying the objects in the list, we can at 
least prevent him from doing structural changes to the list by forbidding 
Iterator#remove() there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5642) queryable state: race condition with HeadListState

2017-01-25 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5642:
--

 Summary: queryable state: race condition with HeadListState
 Key: FLINK-5642
 URL: https://issues.apache.org/jira/browse/FLINK-5642
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


If queryable state accesses a HeapListState instance that is being modified 
during the value's serialisation, it may crash, e.g. with a 
NullPointerException during the serialisation or with an EOFException during 
de-serialisation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5615) queryable state: execute the QueryableStateITCase for all three state back-ends

2017-01-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5615:
--

 Summary: queryable state: execute the QueryableStateITCase for all 
three state back-ends
 Key: FLINK-5615
 URL: https://issues.apache.org/jira/browse/FLINK-5615
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The QueryableStateITCase currently is only tested with the MemoryStateBackend 
but as has been seen in the past, some errors or inconsistent behaviour only 
appeared with different state back-ends. It should thus be extended to be 
tested with all three currently existing state back-ends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5613) QueryableState: requesting a non-existing key in RocksDBStateBackend is not consistent with the MemoryStateBackend and FsStateBackend

2017-01-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5613:
--

 Summary: QueryableState: requesting a non-existing key in 
RocksDBStateBackend is not consistent with the MemoryStateBackend and 
FsStateBackend
 Key: FLINK-5613
 URL: https://issues.apache.org/jira/browse/FLINK-5613
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Querying for a non-existing key for a state that has a default value set 
currently results in the default value being returned in the 
RocksDBStateBackend only. MemoryStateBackend or FsStateBackend will return null 
which results in an UnknownKeyOrNamespace exception.

Default values are now deprecated and will be removed eventually so we should 
not introduce them into the new queryable state API and thus adapt the 
RocksDBStateBackend accordingly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5591) queryable state: request failures failing on the server side do not contain client stack trace

2017-01-20 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5591:
--

 Summary: queryable state: request failures failing on the server 
side do not contain client stack trace
 Key: FLINK-5591
 URL: https://issues.apache.org/jira/browse/FLINK-5591
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Failures during queries using QueryableStateClient throw exceptions like 
UnknownKeyOrNamespace but these are thrown on the server side with its 
stacktrace included, then serialised and sent over to the client where they are 
deserialised and re-thrown.
For debugging it would help a lot if these "server-exceptions" would be 
encapsulated in new client-exception instances so that we also get the client's 
stack trace. Especially since failing requests may be caused by the client 
itself, e.g. non-existing jobs/keys, wrong namespace/serialisers, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5576) extend deserialization functions of KvStateRequestSerializer to detect unconsumed bytes

2017-01-19 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5576:
--

 Summary: extend deserialization functions of 
KvStateRequestSerializer to detect unconsumed bytes
 Key: FLINK-5576
 URL: https://issues.apache.org/jira/browse/FLINK-5576
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


KvStateRequestSerializer#deserializeValue and 
KvStateRequestSerializer#deserializeList both deserialize a given byte array. 
This is used by clients and unit tests and it is fair to assume that these byte 
arrays represent a complete value since we do not offer a method to continue 
reading form the middle of the array anyway. Therefore, we can treat unconsumed 
bytes as errors, e.g. from a wrong serializer being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5561) DataInputDeserializer#available returns one too few

2017-01-18 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5561:
--

 Summary: DataInputDeserializer#available returns one too few
 Key: FLINK-5561
 URL: https://issues.apache.org/jira/browse/FLINK-5561
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


DataInputDeserializer#available seems to assume that the position points to the 
last read byte but instead it points to the next byte. Therefore, it returns a 
value which is 1 smaller than the correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5559) queryable state: KvStateRequestSerializer#deserializeKeyAndNamespace() throws an IOException without own failure message if deserialisation fails

2017-01-18 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5559:
--

 Summary: queryable state: 
KvStateRequestSerializer#deserializeKeyAndNamespace() throws an IOException 
without own failure message if deserialisation fails
 Key: FLINK-5559
 URL: https://issues.apache.org/jira/browse/FLINK-5559
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Minor


KvStateRequestSerializer#deserializeKeyAndNamespace() throws an IOException, 
e.g. EOFException, if the deserialisation fails, e.g. there are not enough 
available bytes.
In these cases, it should instead also throw an IllegalArgumentException with a 
message containing "This indicates a mismatch in the key/namespace serializers 
used by the KvState instance and this access." as the other error cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5530) race condition in AbstractRocksDBState#getSerializedValue

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5530:
--

 Summary: race condition in AbstractRocksDBState#getSerializedValue
 Key: FLINK-5530
 URL: https://issues.apache.org/jira/browse/FLINK-5530
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Blocker


AbstractRocksDBState#getSerializedValue() uses the same key serialisation 
stream as the ordinary state access methods but is called in parallel during 
state queries thus violating the assumption of only one thread accessing it. 

This may lead to either wrong results in queries or corrupt data while queries 
are executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5528) tests: reduce the retry delay in QueryableStateITCase

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5528:
--

 Summary: tests: reduce the retry delay in QueryableStateITCase
 Key: FLINK-5528
 URL: https://issues.apache.org/jira/browse/FLINK-5528
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The QueryableStateITCase uses a retry of 1 second, e.g. if a queried key does 
not exist yet. This seems a bit too conservative as the job may not take that 
long to deploy and especially since getKvStateWithRetries() recovers from 
failures by retrying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5527) QueryableState: requesting a non-existing key in MemoryStateBackend or FsStateBackend does not return the default value

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5527:
--

 Summary: QueryableState: requesting a non-existing key in 
MemoryStateBackend or FsStateBackend does not return the default value
 Key: FLINK-5527
 URL: https://issues.apache.org/jira/browse/FLINK-5527
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Querying for a non-existing key for a state that has a default value set 
currently results in an UnknownKeyOrNamespace exception when the 
MemoryStateBackend or FsStateBackend is used. It should return the default 
value instead just like the RocksDBStateBackend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5526) QueryableState: notify upon receiving a query but having queryable state disabled

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5526:
--

 Summary: QueryableState: notify upon receiving a query but having 
queryable state disabled
 Key: FLINK-5526
 URL: https://issues.apache.org/jira/browse/FLINK-5526
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Priority: Minor


When querying state but having it disabled in the config, a warning should be 
presented to the user that a query was received but the component is disabled. 
This is in addition to the query itself failing with a rather generic exception 
that is not pointing to this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5521) remove unused KvStateRequestSerializer#serializeList

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5521:
--

 Summary: remove unused KvStateRequestSerializer#serializeList
 Key: FLINK-5521
 URL: https://issues.apache.org/jira/browse/FLINK-5521
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


KvStateRequestSerializer#serializeList is unused and instead the state 
backends' serialisation functions are used. Therefore, remove this one and make 
sure KvStateRequestSerializer#deserializeList works with the state backends' 
ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5515) fix unused kvState.getSerializedValue call in KvStateServerHandler

2017-01-16 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5515:
--

 Summary: fix unused kvState.getSerializedValue call in 
KvStateServerHandler
 Key: FLINK-5515
 URL: https://issues.apache.org/jira/browse/FLINK-5515
 Project: Flink
  Issue Type: Improvement
Reporter: Nico Kruber


This was added in 4809f5367b08a9734fc1bd4875be51a9f3bb65aa and is probably a 
left-over from a merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5507) remove queryable list state sink

2017-01-16 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5507:
--

 Summary: remove queryable list state sink
 Key: FLINK-5507
 URL: https://issues.apache.org/jira/browse/FLINK-5507
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The queryable state "sink" using ListState 
(".asQueryableState(, ListStateDescriptor)") stores all incoming data 
forever and is never cleaned. Eventually, it will pile up too much memory and 
is thus of limited use.

We should remove it from the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5482) QueryableStateClient does not recover from a failed lookup due to a non-running job

2017-01-13 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5482:
--

 Summary: QueryableStateClient does not recover from a failed 
lookup due to a non-running job
 Key: FLINK-5482
 URL: https://issues.apache.org/jira/browse/FLINK-5482
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


When the QueryableStateClient is used to issue a query but the job is not 
running yet, its internal lookup result is cached with an IllegalStateException 
that the job was not found. It does, however, never recover from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5334) outdated scala SBT quickstart example

2016-12-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5334:
--

 Summary: outdated scala SBT quickstart example
 Key: FLINK-5334
 URL: https://issues.apache.org/jira/browse/FLINK-5334
 Project: Flink
  Issue Type: Bug
  Components: Quickstarts
Reporter: Nico Kruber


The scala quickstart set up via sbt-quickstart.sh or from the repository at 
https://github.com/tillrohrmann/flink-project seems outdated compared to what 
is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and 
StreamingJob.scala. This should probably be updated and also the hard-coded 
example in sbt-quickstart.sh on the web page could be removed and download the 
newest version instead as the mvn command does.

see 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html
 for these two paths (SBT vs. Maven)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5309) documentation links on the home page point to 1.2-SNAPSHOT

2016-12-09 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5309:
--

 Summary: documentation links on the home page point to 1.2-SNAPSHOT
 Key: FLINK-5309
 URL: https://issues.apache.org/jira/browse/FLINK-5309
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Nico Kruber


The main website at https://flink.apache.org/ has several links to the 
documentation but despite advertising a stable release download, all of those 
links point to the 1.2 branch.
This should be set to the same stable version's documentation instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5308) download links to previous releases are incomplete

2016-12-09 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5308:
--

 Summary: download links to previous releases are incomplete
 Key: FLINK-5308
 URL: https://issues.apache.org/jira/browse/FLINK-5308
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Nico Kruber
Assignee: Nico Kruber


The list of all releases under 
https://flink.apache.org/downloads.html#all-releases
does not contain several previous releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5212) Flakey ScalaShellITCase#testPreventRecreationBatch (out of Java heap space)

2016-11-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5212:
--

 Summary: Flakey ScalaShellITCase#testPreventRecreationBatch (out 
of Java heap space)
 Key: FLINK-5212
 URL: https://issues.apache.org/jira/browse/FLINK-5212
 Project: Flink
  Issue Type: Bug
  Components: Flink on Tez, Scala Shell
Affects Versions: 2.0.0
 Environment: TravisCI
Reporter: Nico Kruber


{code:none}
Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 255.399 sec <<< 
FAILURE! - in org.apache.flink.api.scala.ScalaShellITCase
testPreventRecreationBatch(org.apache.flink.api.scala.ScalaShellITCase)  Time 
elapsed: 198.128 sec  <<< ERROR!
java.lang.OutOfMemoryError: Java heap space
at scala.reflect.internal.Names$class.enterChars(Names.scala:70)
at scala.reflect.internal.Names$class.body$1(Names.scala:116)
at scala.reflect.internal.Names$class.newTermName(Names.scala:127)
at scala.reflect.internal.SymbolTable.newTermName(SymbolTable.scala:16)
at scala.reflect.internal.Names$class.newTermName(Names.scala:83)
at scala.reflect.internal.SymbolTable.newTermName(SymbolTable.scala:16)
at scala.reflect.internal.Names$class.newTermName(Names.scala:144)
at scala.reflect.internal.SymbolTable.newTermName(SymbolTable.scala:16)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser$ConstantPool.getName(ClassfileParser.scala:206)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser$ConstantPool.getExternalName(ClassfileParser.scala:216)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser$ConstantPool.getType(ClassfileParser.scala:286)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.parseMethod(ClassfileParser.scala:565)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.scala$tools$nsc$symtab$classfile$ClassfileParser$$queueLoad$1(ClassfileParser.scala:480)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser$$anonfun$parseClass$1.apply$mcV$sp(ClassfileParser.scala:490)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.parseClass(ClassfileParser.scala:495)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.parse(ClassfileParser.scala:136)
at 
scala.tools.nsc.symtab.SymbolLoaders$ClassfileLoader$$anonfun$doComplete$2.apply$mcV$sp(SymbolLoaders.scala:347)
at 
scala.tools.nsc.symtab.SymbolLoaders$ClassfileLoader$$anonfun$doComplete$2.apply(SymbolLoaders.scala:347)
at 
scala.tools.nsc.symtab.SymbolLoaders$ClassfileLoader$$anonfun$doComplete$2.apply(SymbolLoaders.scala:347)
at 
scala.reflect.internal.SymbolTable.enteringPhase(SymbolTable.scala:235)
at 
scala.tools.nsc.symtab.SymbolLoaders$ClassfileLoader.doComplete(SymbolLoaders.scala:347)
at 
scala.tools.nsc.symtab.SymbolLoaders$SymbolLoader.complete(SymbolLoaders.scala:211)
at 
scala.tools.nsc.symtab.SymbolLoaders$SymbolLoader.load(SymbolLoaders.scala:227)
at scala.reflect.internal.Symbols$Symbol.typeParams(Symbols.scala:1708)
at 
scala.reflect.internal.Types$NoArgsTypeRef.typeParams(Types.scala:1926)
at 
scala.reflect.internal.Types$NoArgsTypeRef.isHigherKinded(Types.scala:1925)
at 
scala.reflect.internal.transform.UnCurry$class.scala$reflect$internal$transform$UnCurry$$expandAlias(UnCurry.scala:22)
at 
scala.reflect.internal.transform.UnCurry$$anon$2.apply(UnCurry.scala:26)
at 
scala.reflect.internal.tpe.TypeMaps$TypeMap.applyToSymbolInfo(TypeMaps.scala:218)
at 
scala.reflect.internal.tpe.TypeMaps$TypeMap.loop$1(TypeMaps.scala:227)
at 
scala.reflect.internal.tpe.TypeMaps$TypeMap.noChangeToSymbols(TypeMaps.scala:229)
at 
scala.reflect.internal.tpe.TypeMaps$TypeMap.mapOver(TypeMaps.scala:243)


Results :

Tests in error: 
  ScalaShellITCase.testPreventRecreationBatch » OutOfMemory Java heap space
{code}

stdout:
https://api.travis-ci.org/jobs/180090640/log.txt?deansi=true

full logs:
https://transfer.sh/nu2wr/34.1.tar.gz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5206) Flakey PythonPlanBinderTest

2016-11-29 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5206:
--

 Summary: Flakey PythonPlanBinderTest
 Key: FLINK-5206
 URL: https://issues.apache.org/jira/browse/FLINK-5206
 Project: Flink
  Issue Type: Bug
 Environment: in TravisCI
Reporter: Nico Kruber


{code:none}
---
 T E S T S
---
Running org.apache.flink.python.api.PythonPlanBinderTest
Job execution failed.
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:903)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:846)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:846)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Output path '/tmp/flink/result2' could not be 
initialized. Canceling task...
at 
org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233)
at 
org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78)
at 
org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:178)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
at java.lang.Thread.run(Thread.java:745)
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 36.891 sec <<< 
FAILURE! - in org.apache.flink.python.api.PythonPlanBinderTest
testJobWithoutObjectReuse(org.apache.flink.python.api.PythonPlanBinderTest)  
Time elapsed: 11.53 sec  <<< FAILURE!
java.lang.AssertionError: Error while calling the test program: Job execution 
failed.
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 

[jira] [Created] (FLINK-5204) Flakey YARNSessionCapacitySchedulerITCase (NullPointerException)

2016-11-29 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5204:
--

 Summary: Flakey YARNSessionCapacitySchedulerITCase 
(NullPointerException)
 Key: FLINK-5204
 URL: https://issues.apache.org/jira/browse/FLINK-5204
 Project: Flink
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 2.0.0
 Environment: TravisCI
Reporter: Nico Kruber


{code:none}
testTaskManagerFailure(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)
  Time elapsed: 0.61 sec  <<< ERROR!
java.lang.NullPointerException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:814)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:580)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:806)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:670)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:234)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy91.getApplications(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:478)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:455)
at 
org.apache.flink.yarn.YarnTestBase.checkClusterEmpty(YarnTestBase.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 

[jira] [Created] (FLINK-5203) YARNHighAvailabilityITCase fails to deploy YARN cluster

2016-11-29 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5203:
--

 Summary: YARNHighAvailabilityITCase fails to deploy YARN cluster
 Key: FLINK-5203
 URL: https://issues.apache.org/jira/browse/FLINK-5203
 Project: Flink
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 2.0.0
 Environment: in TravisCI
Reporter: Nico Kruber


{code:none}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 49.883 sec <<< 
FAILURE! - in org.apache.flink.yarn.YARNHighAvailabilityITCase
testMultipleAMKill(org.apache.flink.yarn.YARNHighAvailabilityITCase)  Time 
elapsed: 42.191 sec  <<< ERROR!
java.lang.RuntimeException: Couldn't deploy Yarn cluster
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:840)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:407)
at 
org.apache.flink.yarn.YARNHighAvailabilityITCase.testMultipleAMKill(YARNHighAvailabilityITCase.java:131)
{code}

stdout log:
https://api.travis-ci.org/jobs/179733979/log.txt?deansi=true

full logs:
https://transfer.sh/UNjFq/29.5.tar.gz




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5202) test timeout (no output for 300s) in YARNHighAvailabilityITCase

2016-11-29 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5202:
--

 Summary: test timeout (no output for 300s) in 
YARNHighAvailabilityITCase
 Key: FLINK-5202
 URL: https://issues.apache.org/jira/browse/FLINK-5202
 Project: Flink
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 2.0.0
 Environment: in TravisCI
Reporter: Nico Kruber


TravisCI occasionally fails since YARNHighAvailabilityITCase fails to produce 
an output (or finish) within 300s. See the logs at the URLs below

textual log:
https://api.travis-ci.org/jobs/179733965/log.txt?deansi=true

full log at https://transfer.sh/xcLvY/29.1.tar.gz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5178) allow BLOB_STORAGE_DIRECTORY_KEY to point to a distributed file system

2016-11-28 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5178:
--

 Summary: allow BLOB_STORAGE_DIRECTORY_KEY to point to a 
distributed file system
 Key: FLINK-5178
 URL: https://issues.apache.org/jira/browse/FLINK-5178
 Project: Flink
  Issue Type: Improvement
  Components: Network
Reporter: Nico Kruber
Assignee: Nico Kruber


After FLINK-5129, high availability (HA) mode adds the ability for the 
BlobCache instances at the task managers to download blobs directly from the 
distributed file system. It would be nice if this also worked in non-HA mode 
and BLOB_STORAGE_DIRECTORY_KEY may point to a distributed file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5081) unable to set yarn.maximum-failed-containers with flink one-time YARN setup

2016-11-16 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5081:
--

 Summary: unable to set yarn.maximum-failed-containers with flink 
one-time YARN setup
 Key: FLINK-5081
 URL: https://issues.apache.org/jira/browse/FLINK-5081
 Project: Flink
  Issue Type: Bug
  Components: Startup Shell Scripts
Affects Versions: 1.1.4
Reporter: Nico Kruber


When letting flink setup YARN for a one-time job, it apparently does not 
deliver the {{yarn.maximum-failed-containers}} parameter to YARN as the 
{{yarn-session.sh}} script does. Adding it to conf/flink-conf.yaml as 
https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#recovery-behavior-of-flink-on-yarn
 suggested also does not work.

example:
{code:none}
flink run -m yarn-cluster -yn 3 -yjm 1024 -ytm 4096 .jar --parallelism 3 
-Dyarn.maximum-failed-containers=100
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5066) prevent LocalInputChannel#getNextBuffer from de-serialising all events when looking for EndOfPartitionEvent only

2016-11-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5066:
--

 Summary: prevent LocalInputChannel#getNextBuffer from 
de-serialising all events when looking for EndOfPartitionEvent only
 Key: FLINK-5066
 URL: https://issues.apache.org/jira/browse/FLINK-5066
 Project: Flink
  Issue Type: Improvement
  Components: Network
Reporter: Nico Kruber
Assignee: Nico Kruber


LocalInputChannel#getNextBuffer de-serialises all incoming events on the 
lookout for an EndOfPartitionEvent.

Instead, if EventSerializer offered a function to check for an event type only 
without de-serialising the whole event, we could save some resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5060) only serialise records once in RecordWriter#emit and RecordWriter#broadcastEmit

2016-11-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5060:
--

 Summary: only serialise records once in RecordWriter#emit and 
RecordWriter#broadcastEmit
 Key: FLINK-5060
 URL: https://issues.apache.org/jira/browse/FLINK-5060
 Project: Flink
  Issue Type: Improvement
  Components: Network
Reporter: Nico Kruber
Assignee: Nico Kruber


Currently, org.apache.flink.runtime.io.network.api.writer.RecordWriter#emit and 
org.apache.flink.runtime.io.network.api.writer.RecordWriter#broadcastEmit 
serialise a record once per target channel. Instead, they could serialise the 
record only once and use the serialised form for every channel and thus save 
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5059) only serialise events once in RecordWriter#broadcastEvent

2016-11-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5059:
--

 Summary: only serialise events once in RecordWriter#broadcastEvent
 Key: FLINK-5059
 URL: https://issues.apache.org/jira/browse/FLINK-5059
 Project: Flink
  Issue Type: Improvement
  Components: Network
Reporter: Nico Kruber
Assignee: Nico Kruber


Currently, 
org.apache.flink.runtime.io.network.api.writer.RecordWriter#broadcastEvent 
serialises the event once per target channel. Instead, it could serialise the 
event only once and use the serialised form for every channel and thus save 
resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5008) Update IDE setup documentation

2016-11-03 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5008:
--

 Summary: Update IDE setup documentation
 Key: FLINK-5008
 URL: https://issues.apache.org/jira/browse/FLINK-5008
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Nico Kruber
Priority: Minor


The IDE setup documentation of Flink is outdated in both parts: IntelliJ IDEA 
was based on an old version and Eclipse/Scala IDE does not work at all anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4