[jira] [Created] (FLINK-7358) Add implicitly converts support for User-defined function

2017-08-02 Thread sunjincheng (JIRA)
sunjincheng created FLINK-7358:
--

 Summary: Add  implicitly converts support for User-defined function
 Key: FLINK-7358
 URL: https://issues.apache.org/jira/browse/FLINK-7358
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: sunjincheng
Assignee: sunjincheng


Currently if user defined a UDF as follows:
{code}
object Func extends ScalarFunction {
  def eval(a: Int, b: Long): String = {
...
  }
}
{code}

And if the table schema is (a: Int, b: int, c: String), then we can not call 
the UDF `Func('a, 'b)`. So
I want add implicitly converts when we call UDF. 

*Note:
In this JIRA. only for TableAPI, And SQL will be fixed in 
https://issues.apache.org/jira/browse/CALCITE-1908.*

What do you think? [~fhueske]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7357) HOP_START() HOP_END() does not work when using HAVING clause with GROUP BY HOP window

2017-08-02 Thread Rong Rong (JIRA)
Rong Rong created FLINK-7357:


 Summary: HOP_START() HOP_END() does not work when using HAVING 
clause with GROUP BY HOP window
 Key: FLINK-7357
 URL: https://issues.apache.org/jira/browse/FLINK-7357
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.3.1
Reporter: Rong Rong


The following SQL does not compile:
{code:title=invalid_having_hop_start_sql}
SELECT 
  c AS k, 
  COUNT(a) AS v, 
  HOP_START(rowtime, INTERVAL '1' MINUTE, INTERVAL '1' MINUTE) AS windowStart, 
  HOP_END(rowtime, INTERVAL '1' MINUTE, INTERVAL '1' MINUTE) AS windowEnd 
FROM 
  T1 
GROUP BY 
  HOP(rowtime, INTERVAL '1' MINUTE, INTERVAL '1' MINUTE), 
  c 
HAVING 
  SUM(b) > 1
{code}
While individually keeping HAVING clause or HOP_START field compiles and runs 
without issue.

more details: 
https://github.com/apache/flink/compare/master...walterddr:having_does_not_work_with_hop_start_end



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] Release 1.3.2, release candidate #2

2017-08-02 Thread Greg Hogan
-1

The Gelly examples jar is not included in the Scala 2.11 convenience binaries 
since change-scala-version.sh is not switching the hard-coded Scala version 
from 2.10 to 2.11 in ./flink-dist/src/main/assemblies/bin.xml. The simplest fix 
may be to revert FLINK-7211 and simply exclude the corresponding javadoc jar 
(this is only an issue in the 1.3 branch, the FLINK-7211 should be working on 
master). I don’t think I’ll have time to submit this tomorrow.

I would like to look into adding an end-to-end test for Gelly for the next 
release.

Greg

> On Jul 30, 2017, at 3:07 AM, Aljoscha Krettek  wrote:
> 
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version 1.3.2, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be 
> deployed to dist.apache.org [2], which is signed with the key with 
> fingerprint 0xA8F4FD97121D7293 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.3.2-rc2" [5],
> * website pull request listing the new release and adding announcement blog 
> post [6]. 
> 
> The vote will be open for at least 72 hours (excluding this current weekend). 
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> Please use the provided document, as discussed before, for coordinating the 
> testing efforts: [7]
> 
> Thanks,
> Aljoscha
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12340984
> [2] http://people.apache.org/~aljoscha/flink-1.3.2-rc2/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1133/
> [5] 
> https://git-wip-us.apache.org/repos/asf?p=flink.git;a=tag;h=e38825d0c8e7fe2191a4c657984d9939ed8dd0ad
> [6] https://github.com/apache/flink-web/pull/75
> [7] 
> https://docs.google.com/document/d/1dN9AM9FUPizIu4hTKAXJSbbAORRdrce-BqQ8AUHlOqE/edit?usp=sharing



[jira] [Created] (FLINK-7356) misleading s3 file uri in configuration file

2017-08-02 Thread Bowen Li (JIRA)
Bowen Li created FLINK-7356:
---

 Summary: misleading s3 file uri in configuration file
 Key: FLINK-7356
 URL: https://issues.apache.org/jira/browse/FLINK-7356
 Project: Flink
  Issue Type: Bug
  Components: Configuration
Affects Versions: 1.3.1
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.4.0, 1.3.2


in 
https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml,
the comment in line 121 should say {{"*s3*://" for S3 file system}} rather than 
{{"S3://" for S3 file system}}, because {{S3://xxx}} is not recognized by AWS 
SDK.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] Service Authorization (redux)

2017-08-02 Thread Eron Wright
Thanks Till and Aljoscha for the feedback.

Seems there are two ways to proceed here, if we accept mutual SSL as the
basis.

a) Backport mutual-auth support from Akka 2.4 to Flakka.
b) Drop support for Scala 2.10 (FLINK-?), move to Akka 2.4 (FLINK-3662).

Let's assume (a) for now.



On Tue, Aug 1, 2017 at 3:05 PM, Till Rohrmann  wrote:

> Dropping Java 7 alone is not enough to move to Akka 2.4+. For that we need
> at least Scala 2.11.
>
> Cheers,
> Till
>
> On Tue, Aug 1, 2017 at 4:22 PM, Aljoscha Krettek 
> wrote:
>
> > Hi Eron,
> >
> > I think after Dropping support for Java 7 we will move to Akka 2.4+, so
> we
> > should be good there. I think quite some users should find a (more)
> secure
> > Flink interesting.
> >
> > Best,
> > Aljoscha
> > > On 24. Jul 2017, at 03:11, Eron Wright  wrote:
> > >
> > > Hello, now might be a good time to revisit an important enhancement to
> > > Flink security, so-called service authorization.   This means the
> > hardening
> > > of a Flink cluster against unauthorized use with some sort of
> > > authentication and authorization scheme.   Today, Flink relies entirely
> > on
> > > network isolation to protect itself from unauthorized job submission
> and
> > > control, and to protect the secrets contained within a Flink cluster.
> > > This is a problem in multi-user environments like YARN/Mesos/K8.
> > >
> > > Last fall, an effort was made to implement service authorization but
> the
> > PR
> > > was ultimately rejected.   The idea was to add a simple secret key to
> all
> > > network communication between the client, JM, and TM.   Akka itself has
> > > such a feature which formed the basis of the solution.  There are
> > usability
> > > challenges with this solution, including a dependency on SSL.
> > >
> > > Since then, the situation has evolved somewhat, and the use of SSL
> mutual
> > > authentication is more viable.   Mutual auth is supported in Akka
> 2.4.12+
> > > (or could be backported to Flakka).  My proposal is:
> > >
> > > 1. Upgrade Akka or backport the functionality to Flakka (see commit
> > > 5d03902c5ec3212cd28f26c9b3ef7c3b628b9451).
> > > 2. Implement SSL on any endpoint that doesn't yet support it (e.g.
> > > queryable state).
> > > 3. Enable mutual auth in Akka and implement it on non-Akka endpoints.
> > > 4. Implement a simple authorization layer that accepts any
> authenticated
> > > connection.
> > > 5. (stretch) generate and store a certificate automatically in YARN
> mode.
> > > 6. (stretch) Develop an alternate authentication method for the Web UI.
> > >
> > > Are folks interested in this capability?  Thoughts on the use of SSL
> > mutual
> > > auth versus something else?  Thanks!
> > >
> > > -Eron
> >
> >
>


Re: [VOTE] Release 1.3.2, release candidate #2

2017-08-02 Thread Nico Kruber
+-1 (non-binding)

L.1 Check if checksums and GPG files match the corresponding release files
- found an issue with a malformed flink-1.3.2-src.tgz.md5 that gnu coreutils 
8.27 was not accepting - already fixed by now

L.3 Check if the source release is building properly (mvn clean verify)
- found these NON-BLOCKING issues: FLINK-6738, FLINK-7351, FLINK-7352
- also found FLINK-7354 which I could NOT get passed by (or only very rarely) 
on Debian 9 without fixing: an instable unit test (may be BLOCKING when 
happening this much?)

L.4 Verify the LICENSE and NOTICE files for the binary and source releases
OK

F.1 Run the start-local.sh, start-cluster.sh
OK

F.2 Examine the *.out files and the log files
OK

F.10 Run manual Tests in "flink-tests" module
(actually I looked for any ignored test in any module)
- NON-BLOCKING-ISSUE: FLINK-7355
- NON-BLOCKING-ISSUE: ChaosMonkeyITCase does not work (but removed in master)


Nico

On Sunday, 30 July 2017 09:07:03 CEST Aljoscha Krettek wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version 1.3.2, as
> follows: [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which is signed with the key with
> fingerprint 0xA8F4FD97121D7293 [3], * all artifacts to be deployed to the
> Maven Central Repository [4], * source code tag "release-1.3.2-rc2" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
> 
> The vote will be open for at least 72 hours (excluding this current
> weekend). It is adopted by majority approval, with at least 3 PMC
> affirmative votes.
> 
> Please use the provided document, as discussed before, for coordinating the
> testing efforts: [7]
> 
> Thanks,
> Aljoscha
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522
> ersion=12340984 [2] http://people.apache.org/~aljoscha/flink-1.3.2-rc2/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1133/
> [5]
> https://git-wip-us.apache.org/repos/asf?p=flink.git;a=tag;h=e38825d0c8e7fe2
> 191a4c657984d9939ed8dd0ad [6] https://github.com/apache/flink-web/pull/75
> [7]
> https://docs.google.com/document/d/1dN9AM9FUPizIu4hTKAXJSbbAORRdrce-BqQ8AUH
> lOqE/edit?usp=sharing



signature.asc
Description: This is a digitally signed message part.


[jira] [Created] (FLINK-7355) YARNSessionFIFOITCase#testfullAlloc does not run anymore

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7355:
--

 Summary: YARNSessionFIFOITCase#testfullAlloc does not run anymore
 Key: FLINK-7355
 URL: https://issues.apache.org/jira/browse/FLINK-7355
 Project: Flink
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 1.4.0, 1.3.2
Reporter: Nico Kruber
Priority: Minor


{{YARNSessionFIFOITCase#testfullAlloc}} is a test case that is ignored because 
of a too high resource consumption but if run manually, it fails with

{code}
Error while deploying YARN cluster: Couldn't deploy Yarn session cluster
java.lang.RuntimeException: Couldn't deploy Yarn session cluster
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:367)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:663)
at org.apache.flink.yarn.YarnTestBase$Runner.run(YarnTestBase.java:680)
Caused by: java.lang.IllegalArgumentException: The configuration value 
'containerized.heap-cutoff-min' is higher (600) than the requested amount of 
memory 256
at org.apache.flink.yarn.Utils.calculateHeapSize(Utils.java:101)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.setupApplicationMasterContainer(AbstractYarnClusterDescriptor.java:1356)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:840)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:456)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:362)
... 2 more
{code}

in current master and

{code}
Error while starting the YARN Client: The JobManager memory (256) is below the 
minimum required memory amount of 768 MB
java.lang.IllegalArgumentException: The JobManager memory (256) is below the 
minimum required memory amount of 768 MB
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.setJobManagerMemory(AbstractYarnClusterDescriptor.java:187)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:314)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:622)
at org.apache.flink.yarn.YarnTestBase$Runner.run(YarnTestBase.java:645)
{code}

in Flink 1.3.2 RC2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] Release 1.3.2, release candidate #2

2017-08-02 Thread Tzu-Li (Gordon) Tai
+1

1. Cluster tests
Tested with the stateful state machine job, using the following settings and 
variations:
- Cloud env: GCE
- Distributions: CDH 5.9.0
- Flink deployment method: YARN per-job (Hadoop 2.6.0)
- HA: both enabled & disabled
- Kerberos: enabled, keytab mode
- Kafka version: 0.9.0.1
- State Backends: Heap (Sync / Async) & RocksDB (incremental / full)
- Filesystem: HDFS (Hadoop 2.6.0)
- Externalized checkpoints: enabled & disabled

Logs also seem to be hygiene. All tests pass (state machine job processes 
records and does not fail).
Also verified that the job fails when using 1.3.1, with Kafka brokers 
deliberately shutdown on job restore due to FLINK-7195.

2. Build with Scala 2.10 successful

3. Minor new Kafka consumer commitsFailed and commitsSuccessful metric
Works nicely and can be queried in the REST API / web UI.

4. Checked README.md, LICENSE, NOTICE
Ok, but we need to remember to bring README.md up-to-date with the recent new 
important features (see FLINK-7353)

5. Website release announcement
Left comments on the PR, can be addressed independent of the vote.

Cheers,
Gordon

On 30 July 2017 at 3:07:08 PM, Aljoscha Krettek (aljos...@apache.org) wrote:

Hi everyone,  

Please review and vote on the release candidate #2 for the version 1.3.2, as 
follows:  
[ ] +1, Approve the release  
[ ] -1, Do not approve the release (please provide specific comments)  


The complete staging area is available for your review, which includes:  
* JIRA release notes [1],  
* the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which is signed with the key with fingerprint 
0xA8F4FD97121D7293 [3],  
* all artifacts to be deployed to the Maven Central Repository [4],  
* source code tag "release-1.3.2-rc2" [5],  
* website pull request listing the new release and adding announcement blog 
post [6].  

The vote will be open for at least 72 hours (excluding this current weekend). 
It is adopted by majority approval, with at least 3 PMC affirmative votes.  

Please use the provided document, as discussed before, for coordinating the 
testing efforts: [7]  

Thanks,  
Aljoscha  

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12340984
  
[2] http://people.apache.org/~aljoscha/flink-1.3.2-rc2/  
[3] https://dist.apache.org/repos/dist/release/flink/KEYS  
[4] https://repository.apache.org/content/repositories/orgapacheflink-1133/  
[5] 
https://git-wip-us.apache.org/repos/asf?p=flink.git;a=tag;h=e38825d0c8e7fe2191a4c657984d9939ed8dd0ad
  
[6] https://github.com/apache/flink-web/pull/75  
[7] 
https://docs.google.com/document/d/1dN9AM9FUPizIu4hTKAXJSbbAORRdrce-BqQ8AUHlOqE/edit?usp=sharing

[jira] [Created] (FLINK-7354) test instability in LocalFlinkMiniClusterITCase#testLocalFlinkMiniClusterWithMultipleTaskManagers

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7354:
--

 Summary: test instability in 
LocalFlinkMiniClusterITCase#testLocalFlinkMiniClusterWithMultipleTaskManagers
 Key: FLINK-7354
 URL: https://issues.apache.org/jira/browse/FLINK-7354
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1, 1.1.4, 1.4.0, 1.3.2
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Critical


During {{mvn clean install}} on the 1.3.2 RC2, I found an inconsistently 
failing test at 
{{LocalFlinkMiniClusterITCase#testLocalFlinkMiniClusterWithMultipleTaskManagers}}:

{code}
testLocalFlinkMiniClusterWithMultipleTaskManagers(org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase)
  Time elapsed: 34.978 sec  <<< FAILURE!
java.lang.AssertionError: Thread Thread[initialSeedUniquifierGenerator,5,main] 
was started by the mini cluster, but not shut down
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase.testLocalFlinkMiniClusterWithMultipleTaskManagers(LocalFlinkMiniClusterITCase.java:168)
{code}

Searching the web for that error yields one previous thread on the dev-list, so 
this seems to be valid for quite old versions of flink, too, but apparently, 
was never solved:
https://lists.apache.org/thread.html/07ce439bf6d358bd3139541b52ef6b8e8af249a27e09ae10b6698f81@%3Cdev.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7353) Bring project README.md up to date

2017-08-02 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-7353:
--

 Summary: Bring project README.md up to date
 Key: FLINK-7353
 URL: https://issues.apache.org/jira/browse/FLINK-7353
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0, 1.3.2
Reporter: Tzu-Li (Gordon) Tai


The project's README.md does not contain some of the most important new 
features such as large state handling (out-of-core state, incremental 
checkpointing), dynamic rescaling, etc. naming a few. The Table API also isn't 
mentioned, which I think is also one of the most important high-level APIs 
right now in Flink.

We should revisit the README.md and make sure it presents Flink with the most 
up-to-date features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-02 Thread Kostas Kloudas
+1

> On Aug 2, 2017, at 3:16 PM, Till Rohrmann  wrote:
> 
> +1
> 
> On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter 
> wrote:
> 
>> +1
>> 
>> Am 28.07.2017 um 16:03 schrieb Stephan Ewen :
>> 
>> Seems like no one raised a concern so far about dropping the savepoint
>> format compatibility for 1.1 in 1.4.
>> 
>> Leaving this thread open for some more days, but from the sentiment, it
>> seems like we should go ahead?
>> 
>> On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen  wrote:
>> 
>>> Hi users!
>>> 
>>> Flink currently maintains backwards compatibility for savepoint formats,
>>> which means that savepoints taken with Flink version 1.1.x and 1.2.x can be
>>> resumed in Flink 1.3.x
>>> 
>>> We are discussing how many versions back to support. The proposition is
>>> the following:
>>> 
>>> *   Suggestion: Flink 1.4.0 will be able to resume savepoints taken with
>>> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and 1.0.x*
>>> 
>>> 
>>> The reason for that is that there is a lot of code mapping between the
>>> completely different legacy format (1.1.x, not re-scalable) and the
>>> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly
>>> help the development of state and checkpointing features to drop that old
>>> code.
>>> 
>>> Please let us know if you have concerns about that.
>>> 
>>> Best,
>>> Stephan
>>> 
>>> 
>> 
>> 



Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-02 Thread Till Rohrmann
+1

On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter 
wrote:

> +1
>
> Am 28.07.2017 um 16:03 schrieb Stephan Ewen :
>
> Seems like no one raised a concern so far about dropping the savepoint
> format compatibility for 1.1 in 1.4.
>
> Leaving this thread open for some more days, but from the sentiment, it
> seems like we should go ahead?
>
> On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen  wrote:
>
>> Hi users!
>>
>> Flink currently maintains backwards compatibility for savepoint formats,
>> which means that savepoints taken with Flink version 1.1.x and 1.2.x can be
>> resumed in Flink 1.3.x
>>
>> We are discussing how many versions back to support. The proposition is
>> the following:
>>
>> *   Suggestion: Flink 1.4.0 will be able to resume savepoints taken with
>> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and 1.0.x*
>>
>>
>> The reason for that is that there is a lot of code mapping between the
>> completely different legacy format (1.1.x, not re-scalable) and the
>> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly
>> help the development of state and checkpointing features to drop that old
>> code.
>>
>> Please let us know if you have concerns about that.
>>
>> Best,
>> Stephan
>>
>>
>
>


[jira] [Created] (FLINK-7352) ExecutionGraphRestartTest timeouts

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7352:
--

 Summary: ExecutionGraphRestartTest timeouts
 Key: FLINK-7352
 URL: https://issues.apache.org/jira/browse/FLINK-7352
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination, Tests
Affects Versions: 1.4.0, 1.3.2
Reporter: Nico Kruber
Priority: Critical


Recently, I received timeouts from some tests in {{ExecutionGraphRestartTest}} 
like this

{code}
Tests in error: 
  ExecutionGraphRestartTest.testConcurrentLocalFailAndRestart:638 » Timeout
{code}

This particular instance is from 1.3.2 RC2 and stuck in 
{{ExecutionGraphTestUtils#waitUntilDeployedAndSwitchToRunning()}} but I also 
had instances stuck in {{ExecutionGraphTestUtils#waitUntilJobStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7351) test instability in JobClientActorRecoveryITCase#testJobClientRecovery

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7351:
--

 Summary: test instability in 
JobClientActorRecoveryITCase#testJobClientRecovery
 Key: FLINK-7351
 URL: https://issues.apache.org/jira/browse/FLINK-7351
 Project: Flink
  Issue Type: Bug
  Components: Job-Submission, Tests
Affects Versions: 1.3.2
Reporter: Nico Kruber
Priority: Minor


On a 16-core VM, the following test failed during {{mvn clean verify}}

{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec <<< 
FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase
testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase)
  Time elapsed: 21.299 sec  <<< ERROR!
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not 
enough free slots available to run the job. You can decrease the operator 
parallelism or increase the number of slots per TaskManager in the 
configuration. Resources available to scheduler: Number of instances=0, total 
number of slots=0, available slots=0
at 
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334)
at 
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139)
at 
org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368)
at 
org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309)
at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7350) only execute japicmp in one build profile

2017-08-02 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-7350:
---

 Summary: only execute japicmp in one build profile
 Key: FLINK-7350
 URL: https://issues.apache.org/jira/browse/FLINK-7350
 Project: Flink
  Issue Type: Improvement
  Components: Travis
Affects Versions: 1.4.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.4.0


Similarly to FLINK-7349 we improve build times (and stability!) by only 
executing the japicmp plugin in the build profile that builds the all of flink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7349) Only execute checkstyle in one build profile

2017-08-02 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-7349:
---

 Summary: Only execute checkstyle in one build profile
 Key: FLINK-7349
 URL: https://issues.apache.org/jira/browse/FLINK-7349
 Project: Flink
  Issue Type: Improvement
  Components: Checkstyle, Travis
Affects Versions: 1.4.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.4.0


We can save some time in 4/5 build profiles by skipping checkstyle. One of the 
build profiles builds flink completely and would suffice as a check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7348) Allow redundant modifiers on methods

2017-08-02 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-7348:
---

 Summary: Allow redundant modifiers on methods
 Key: FLINK-7348
 URL: https://issues.apache.org/jira/browse/FLINK-7348
 Project: Flink
  Issue Type: Improvement
  Components: Checkstyle
Affects Versions: 1.4.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.4.0


As per the discussion in https://github.com/apache/flink/pull/4447 we should 
allow redundant modifiers on methods, and revert changes that removed {{final}} 
modifiers from methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7347) "removeAll" is extremely inefficient in MessageAcknowledgingSourceBase.notifyCheckpointComplete

2017-08-02 Thread Yonatan Most (JIRA)
Yonatan Most created FLINK-7347:
---

 Summary: "removeAll" is extremely inefficient in 
MessageAcknowledgingSourceBase.notifyCheckpointComplete
 Key: FLINK-7347
 URL: https://issues.apache.org/jira/browse/FLINK-7347
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Affects Versions: 1.3.1
Reporter: Yonatan Most


Observe this line in 
{{MessageAcknowledgingSourceBase.notifyCheckpointComplete}}:
{code}
idsProcessedButNotAcknowledged.removeAll(checkpoint.f1);
{code}

The implementation of {{removeAll}} is such that if the set is smaller than the 
collection to remove, then the set is iterated and every item is checked for 
containment in the collection. The type of {{checkpoint.f1}} here is 
{{ArrayList}}, so the {{contains}} action is very inefficient, and it is 
performed for every item in {{idsProcessedButNotAcknowledged}}.
In our pipeline we had about 10 million events processed, and the checkpoint 
was stuck on the {{removeAll}} call for hours.
A simple solution is to make {{idsForCurrentCheckpoint}} a {{HashSet}} instead 
of an {{ArrayList}}. The fact that it's a list is not really used anywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7346) EventTimeWindowCheckpointingITCase.testPreAggregatedSlidingTimeWindow killed by maven watchdog on Travis

2017-08-02 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7346:
--

 Summary: 
EventTimeWindowCheckpointingITCase.testPreAggregatedSlidingTimeWindow killed by 
maven watchdog on Travis
 Key: FLINK-7346
 URL: https://issues.apache.org/jira/browse/FLINK-7346
 Project: Flink
  Issue Type: Bug
  Components: DataStream API, Tests
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Several test runs fail with the watchdog killing the tests after receiving no 
output for 300s showing 
{{EventTimeWindowCheckpointingITCase.testPreAggregatedSlidingTimeWindow}} as 
one of the tests, sometimes with another test running in parallel. It does not 
seem to be hanging though and the only reason for this behaviour may be that 
the tests, especially with RocksDB, take very long with no output by each of 
the test methods in {{AbstractEventTimeWindowCheckpointingITCase}}. Thus adding 
output per method should fix the spurious test failures.

Some failing instances:
https://travis-ci.org/apache/flink/jobs/259460738
https://travis-ci.org/apache/flink/jobs/259748656



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7345) Add BRANCH PATTERN.

2017-08-02 Thread zhangxiaoyu (JIRA)
zhangxiaoyu created FLINK-7345:
--

 Summary: Add BRANCH PATTERN.
 Key: FLINK-7345
 URL: https://issues.apache.org/jira/browse/FLINK-7345
 Project: Flink
  Issue Type: Bug
Reporter: zhangxiaoyu


Try to support branch pattern.
The details are at 
https://docs.google.com/document/d/1YNjOYF7BagM4agx_TI6hQkmraVLT9ZpKhBsgHDGi-U8/edit#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-02 Thread Stefan Richter
+1

> Am 28.07.2017 um 16:03 schrieb Stephan Ewen :
> 
> Seems like no one raised a concern so far about dropping the savepoint format 
> compatibility for 1.1 in 1.4.
> 
> Leaving this thread open for some more days, but from the sentiment, it seems 
> like we should go ahead?
> 
> On Wed, Jul 12, 2017 at 4:43 PM, Stephan Ewen  > wrote:
> Hi users!
> 
> Flink currently maintains backwards compatibility for savepoint formats, 
> which means that savepoints taken with Flink version 1.1.x and 1.2.x can be 
> resumed in Flink 1.3.x
> 
> We are discussing how many versions back to support. The proposition is the 
> following:
> 
>Suggestion: Flink 1.4.0 will be able to resume savepoints taken with 
> version 1.3.x and 1.2.x, but not savepoints from version 1.1.x and 1.0.x
> 
> 
> The reason for that is that there is a lot of code mapping between the 
> completely different legacy format (1.1.x, not re-scalable) and the 
> key-group-oriented format (1.2.x onwards, re-scalable). It would greatly help 
> the development of state and checkpointing features to drop that old code.
> 
> Please let us know if you have concerns about that.
> 
> Best,
> Stephan
> 
> 



[jira] [Created] (FLINK-7343) Kafka010ProducerITCase instability

2017-08-02 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-7343:
-

 Summary: Kafka010ProducerITCase instability
 Key: FLINK-7343
 URL: https://issues.apache.org/jira/browse/FLINK-7343
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski


As reported by [~till.rohrmann] in 
https://issues.apache.org/jira/browse/FLINK-6996 there seems to be a test 
instability with 
`Kafka010ProducerITCase>KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink`

https://travis-ci.org/tillrohrmann/flink/jobs/258538641

It is probably related to log.flush intervals in Kafka, which delay flushing 
the data to files and potentially causing data loses on killing Kafka brokers 
in the tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Towards a spec for robust streaming SQL, Part 2

2017-08-02 Thread Tyler Akidau
Thank you all for the comments/input, I appreciate the time you've put into
this. I've responded to a handful of the major ones. There are some more
I'd like to respond to, but I'm out of time for tonight, so more tomorrow.

-Tyler

On Tue, Aug 1, 2017 at 12:24 PM Julian Hyde  wrote:

> I have problems with a couple of the axioms: that a SQL object is
> either a table or a stream, but not both; and that a query is bounded
> if and only if it contains no unbounded streams.
>
> I don't have problems with other axioms, such that a query is either
> bounded or unbounded. And I haven't looked in detail at triggering
> semantics; I don't think there will be major issues, but let's clear
> up the 2 problems above first.
>
> I have added a section "Julian’s thoughts on the fundamentals" to the
> end of the document.
>
> Julian
>
>
> On Tue, Aug 1, 2017 at 6:40 AM, Fabian Hueske  wrote:
> > As promised, I went of the document and made some comments.
> > I also added a bit of information about the current SQL support in Flink
> > and its internals.
> >
> > Thanks, Fabian
> >
> > 2017-07-30 13:22 GMT+02:00 Shaoxuan Wang :
> >
> >> Hi Tyler,
> >> Thanks for putting all the efforts into a doc. It is really well written
> >> and organized.
> >> I like the most part. The major concern I have is about the "explicit
> >> trigger". I left a few comments towards this and would like to know what
> >> the others think about it.
> >>
> >> Regards,
> >> Shaoxuan
> >>
> >> On Sun, Jul 30, 2017 at 4:43 PM, Fabian Hueske 
> wrote:
> >>
> >> > Thanks for the great write up!
> >> >
> >> > I think this s very good starting point for a detailed discussion
> about
> >> > features, syntax and semantics of streaming SQL.
> >> > I'll comment on the document in the next days and describe Flink's
> >> current
> >> > status, our approaches (or planned approaches) and ask a couple of
> >> > questions.
> >> >
> >> > Thanks, Fabian
> >> >
> >> > 2017-07-28 3:05 GMT+02:00 Julian Hyde :
> >> >
> >> > > Tyler,
> >> > >
> >> > > Thanks for this. I am reading the document thoroughly and will give
> my
> >> > > feedback in a day or two.
> >> > >
> >> > > Julian
> >> > >
> >> > > > On Jul 25, 2017, at 12:54 PM, Pramod Immaneni <
> >> pra...@datatorrent.com>
> >> > > wrote:
> >> > > >
> >> > > > Thanks for the invitation Tyler. I am sure folks who worked on the
> >> > > calcite
> >> > > > integration and others would be interested.
> >> > > >
> >> > > > On Tue, Jul 25, 2017 at 12:12 PM, Tyler Akidau
> >> > > 
> >> > > > wrote:
> >> > > >
> >> > > >> +d...@apex.apache.org, since I'm told Apex has a Calcite
> integration
> >> > as
> >> > > >> well. If anyone on the Apex side wants to join in on the fun,
> your
> >> > input
> >> > > >> would be welcomed!
> >> > > >>
> >> > > >> -Tyler
> >> > > >>
> >> > > >>
> >> > > >> On Mon, Jul 24, 2017 at 4:34 PM Tyler Akidau  >
> >> > > wrote:
> >> > > >>
> >> > > >>> Hello Flink, Calcite, and Beam dev lists!
> >> > > >>>
> >> > > >>> Linked below is the second document I promised way back in April
> >> > > >> regarding
> >> > > >>> a collaborative spec for streaming SQL in Beam/Calcite/Flink (&
> >> > > apologies
> >> > > >>> for the delay; I thought I was nearly done a while back and then
> >> > > temporal
> >> > > >>> joins expanded to something much larger than expected).
> >> > > >>>
> >> > > >>> To repeat what it says in the doc, my hope is that it can serve
> >> > various
> >> > > >>> purposes over it's lifetime:
> >> > > >>>
> >> > > >>>   -
> >> > > >>>   - A discussion ground for ironing out any remaining features
> >> > > necessary
> >> > > >>>   for supporting robust streaming semantics in Calcite SQL.
> >> > > >>>
> >> > > >>>   - A rough, high-level source of truth for tracking efforts
> >> underway
> >> > > in
> >> > > >>>   support of this, currently spanning the Calcite, Flink, and
> Beam
> >> > > >> projects.
> >> > > >>>
> >> > > >>>   - A written specification of the changes that were made, for
> the
> >> > sake
> >> > > >>>   of understanding the delta after the fact.
> >> > > >>>
> >> > > >>> The first and third points are, IMO, the most important. AFAIK,
> >> there
> >> > > are
> >> > > >>> a few features missing still that need to be defined (e.g.,
> >> triggers
> >> > > >>> equivalents via EMIT, robust temporal join support). I'm also
> >> > > proposing a
> >> > > >>> clear distinction of streams and tables, which I think is
> >> important,
> >> > > but
> >> > > >>> which I believe is not the approach most folks have been taking
> in
> >> > this
> >> > > >>> area. Sorting out these open issues and then having a concise
> >> record
> >> > of
> >> > > >> the
> >> > > >>> solutions adopted will be important for providing a solid
> streaming
> >> > > >>> experience and teaching folks how to use it.
> >> > > >>>
> >> > > >>> At any rate, I would much