[jira] [Closed] (FLINK-3977) Subclasses of InternalWindowFunction must support OutputTypeConfigurable

2016-06-18 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3977.

   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed with 18744b2c846aa51ed317c4c7409519f25e3eafb7

> Subclasses of InternalWindowFunction must support OutputTypeConfigurable
> 
>
> Key: FLINK-3977
> URL: https://issues.apache.org/jira/browse/FLINK-3977
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 1.1.0
>
>
> Right now, if they wrap functions and a wrapped function implements 
> {{OutputTypeConfigurable}}, {{setOutputType}} is never called. This manifests 
> itself, for example, in FoldFunction on a window with evictor not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3641) Document registerCachedFile API call

2016-06-18 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3641.

   Resolution: Done
Fix Version/s: 1.1.0

Done with ba62df14a52660cec85783b1070821acd144fd06

> Document registerCachedFile API call
> 
>
> Key: FLINK-3641
> URL: https://issues.apache.org/jira/browse/FLINK-3641
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Fabian Hueske
>Priority: Minor
> Fix For: 1.1.0
>
>
> Flink's stable API supports the {{registerCachedFile}} API call at the 
> {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online 
> documentation. Furthermore, the {{DistributedCache}} is also not explained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3949) Collect Metrics in Runtime Operators

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338209#comment-15338209
 ] 

ASF GitHub Bot commented on FLINK-3949:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2119


> Collect Metrics in Runtime Operators
> 
>
> Key: FLINK-3949
> URL: https://issues.apache.org/jira/browse/FLINK-3949
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Stephan Ewen
>Assignee: Chesnay Schepler
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3641) Document registerCachedFile API call

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338207#comment-15338207
 ] 

ASF GitHub Bot commented on FLINK-3641:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2122


> Document registerCachedFile API call
> 
>
> Key: FLINK-3641
> URL: https://issues.apache.org/jira/browse/FLINK-3641
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Fabian Hueske
>Priority: Minor
>
> Flink's stable API supports the {{registerCachedFile}} API call at the 
> {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online 
> documentation. Furthermore, the {{DistributedCache}} is also not explained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3977) Subclasses of InternalWindowFunction must support OutputTypeConfigurable

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338208#comment-15338208
 ] 

ASF GitHub Bot commented on FLINK-3977:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2118


> Subclasses of InternalWindowFunction must support OutputTypeConfigurable
> 
>
> Key: FLINK-3977
> URL: https://issues.apache.org/jira/browse/FLINK-3977
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Fabian Hueske
>Priority: Critical
>
> Right now, if they wrap functions and a wrapped function implements 
> {{OutputTypeConfigurable}}, {{setOutputType}} is never called. This manifests 
> itself, for example, in FoldFunction on a window with evictor not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2118: [FLINK-3977] InternalWindowFunctions implement Out...

2016-06-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2118


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2122: [FLINK-3641] Add documentation for DataSet distrib...

2016-06-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2119: [FLINK-3949] Add numSplitsProcessed (Streaming)

2016-06-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2119


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3641) Document registerCachedFile API call

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338205#comment-15338205
 ] 

ASF GitHub Bot commented on FLINK-3641:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2122
  
Merging


> Document registerCachedFile API call
> 
>
> Key: FLINK-3641
> URL: https://issues.apache.org/jira/browse/FLINK-3641
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Fabian Hueske
>Priority: Minor
>
> Flink's stable API supports the {{registerCachedFile}} API call at the 
> {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online 
> documentation. Furthermore, the {{DistributedCache}} is also not explained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2122: [FLINK-3641] Add documentation for DataSet distributed ca...

2016-06-18 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2122
  
Merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3859) Add BigDecimal/BigInteger support to Table API

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338204#comment-15338204
 ] 

ASF GitHub Bot commented on FLINK-3859:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2088
  
Won't merge this PR at this point. A few Travis builds timed out and I 
found that the added expression tests increase build time by about one minute 
due to the code-gen compilation overhead. 

@twalthr, do you think we can adapt the tests to batch several expressions 
into a single class with multiple methods to invoke the compiler less 
frequently?


> Add BigDecimal/BigInteger support to Table API
> --
>
> Key: FLINK-3859
> URL: https://issues.apache.org/jira/browse/FLINK-3859
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Critical
>
> Since FLINK-3786 has been solved, we can now start integrating 
> BigDecimal/BigInteger into the Table API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2088: [FLINK-3859] [table] Add BigDecimal/BigInteger support to...

2016-06-18 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2088
  
Won't merge this PR at this point. A few Travis builds timed out and I 
found that the added expression tests increase build time by about one minute 
due to the code-gen compilation overhead. 

@twalthr, do you think we can adapt the tests to batch several expressions 
into a single class with multiple methods to invoke the compiler less 
frequently?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2618) ExternalSortITCase failure

2016-06-18 Thread Kostas Kloudas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338186#comment-15338186
 ] 

Kostas Kloudas commented on FLINK-2618:
---

And another one: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/138556355/log.txt

> ExternalSortITCase failure
> --
>
> Key: FLINK-2618
> URL: https://issues.apache.org/jira/browse/FLINK-2618
> Project: Flink
>  Issue Type: Bug
>Reporter: Sachin Goel
>  Labels: test-stability
>
> {{ExternalSortITCase.testSpillingSortWithIntermediateMerge}} fails with the 
> exception:
> {code}
> org.apache.flink.types.NullKeyFieldException: Field 0 is null, but expected 
> to hold a key.
>   at 
> org.apache.flink.api.common.typeutils.record.RecordComparator.setReference(RecordComparator.java:212)
>   at 
> org.apache.flink.api.common.typeutils.record.RecordComparator.setReference(RecordComparator.java:40)
>   at 
> org.apache.flink.runtime.operators.sort.MergeIterator$HeadStream.nextHead(MergeIterator.java:127)
>   at 
> org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:88)
>   at 
> org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:69)
>   at 
> org.apache.flink.runtime.operators.sort.ExternalSortITCase.testSpillingSortWithIntermediateMerge(ExternalSortITCase.java:301)
> {code}
> Here is the build log: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/78579607/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338100#comment-15338100
 ] 

ASF GitHub Bot commented on FLINK-3677:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2109
  
There are a number of unstable tests that we are aware of. Rebasing to the 
latest master can reduce this issues, but they aren't completely resolved yet.


> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Ivan Mushketyk
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...

2016-06-18 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2109
  
There are a number of unstable tests that we are aware of. Rebasing to the 
latest master can reduce this issues, but they aren't completely resolved yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338077#comment-15338077
 ] 

ASF GitHub Bot commented on FLINK-3677:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
By the way, build in CI seems  to fail pretty much randomly. Do I need to 
rebase my changes to a later revision to solve this?


> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Ivan Mushketyk
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...

2016-06-18 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
By the way, build in CI seems  to fail pretty much randomly. Do I need to 
rebase my changes to a later revision to solve this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338076#comment-15338076
 ] 

ASF GitHub Bot commented on FLINK-3677:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
Ok, I'll update my PR when others will comment on it.


> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Ivan Mushketyk
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...

2016-06-18 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
Ok, I'll update my PR when others will comment on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4091) flink-connector-cassandra has conflicting guava version

2016-06-18 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337971#comment-15337971
 ] 

Chesnay Schepler commented on FLINK-4091:
-

If i recall correctly that's just what we did, but it would be interesting to 
see what would happen if the cassandra jar is available under /lib and not 
contained in the far-jar.

Shading it to a different namespace would be the easiest solution, but I'm 
still curious why this popped up now.

> flink-connector-cassandra has conflicting guava version
> ---
>
> Key: FLINK-4091
> URL: https://issues.apache.org/jira/browse/FLINK-4091
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
> Environment: MacOSX, 1.10-SNAPSHOT (head is 
> 1a6bab3ef76805685044cf4521e32315169f9033)
>Reporter: Dominik Bruhn
>
> The newly merged cassandra streaming connector has an issue with its guava 
> dependency.
> The build-process for flink-connector-cassandra creates shaded JAR file which 
> contains the connector, the datastax cassandra driver plus in 
> org.apache.flink.shaded a shaded copy of guava. 
> The datastax cassandra driver calls into Futures.withFallback ([1]) which is 
> present in this guava version. This also works inside the 
> flink-connector-cassandra jar.
> Now the actual build-process for Flink happens and builds another shaded JAR 
> and creates the flink-dist.jar. Inside this JAR, there is also a shaded 
> version of guava inside org.apache.flink.shaded.
> Now the issue: The guava version which is in the flink-dist.jar is not 
> compatible and doesn't contain the Futures.withFallback which the datastax 
> driver is using.
> This leads into the following issue: You can without any problems launch a 
> flink task which uses the casandra driver locally (so through the 
> mini-cluster) because that is never using the flink-dist.jar. 
> BUT: As soon as you are trying to start this job on a flink cluster (which 
> uses the flink-dist.jar), the job breaks with the following exception:
> https://gist.github.com/theomega/5ab9b14ffb516b15814de28e499b040d
> You can inspect this by opening the 
> flink-connector-cassandra_2.11-1.1-SNAPSHOT.jar and the 
> flink-dist_2.11-1.1-SNAPSHOT.jar in a java decompiler.
> I don't know a good solution here: Perhaps it would be one solution to shade 
> the guava for the cassandra-driver somewhere else than at 
> org.apache.flink.shaded.
> [1]: 
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#withFallback(com.google.common.util.concurrent.ListenableFuture,
>  com.google.common.util.concurrent.FutureFallback, 
> java.util.concurrent.Executor)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3801) Upgrade Joda-Time library to 2.9.3

2016-06-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3801:
--
Description: 
Currently yoda-time 2.5 is used which was very old.

We should upgrade to 2.9.3

  was:
Currently yoda-time 2.5 is used which was very old.


We should upgrade to 2.9.3


> Upgrade Joda-Time library to 2.9.3
> --
>
> Key: FLINK-3801
> URL: https://issues.apache.org/jira/browse/FLINK-3801
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> Currently yoda-time 2.5 is used which was very old.
> We should upgrade to 2.9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1517: [FLINK-3477] [runtime] Add hash-based combine strategy fo...

2016-06-18 Thread ggevay
Github user ggevay commented on the issue:

https://github.com/apache/flink/pull/1517
  
I have rebased to the current master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337835#comment-15337835
 ] 

ASF GitHub Bot commented on FLINK-3477:
---

Github user ggevay commented on the issue:

https://github.com/apache/flink/pull/1517
  
I have rebased to the current master.


> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>Assignee: Gabor Gevay
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4091) flink-connector-cassandra has conflicting guava version

2016-06-18 Thread Dominik Bruhn (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337750#comment-15337750
 ] 

Dominik Bruhn commented on FLINK-4091:
--

Here is the stdout of the run command:
https://gist.github.com/theomega/4d1a1b83de1bbf715dd60d17b0143714

And here is the log:
https://gist.github.com/theomega/33f7f5b8d772d17c9c071614c4a73525

Could it be related to how I package the job? Currently I'm integrating the 
flink-connector-cassandra into my fat job jar. I didn't find any documentation 
if that is the right approach.

> flink-connector-cassandra has conflicting guava version
> ---
>
> Key: FLINK-4091
> URL: https://issues.apache.org/jira/browse/FLINK-4091
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
> Environment: MacOSX, 1.10-SNAPSHOT (head is 
> 1a6bab3ef76805685044cf4521e32315169f9033)
>Reporter: Dominik Bruhn
>
> The newly merged cassandra streaming connector has an issue with its guava 
> dependency.
> The build-process for flink-connector-cassandra creates shaded JAR file which 
> contains the connector, the datastax cassandra driver plus in 
> org.apache.flink.shaded a shaded copy of guava. 
> The datastax cassandra driver calls into Futures.withFallback ([1]) which is 
> present in this guava version. This also works inside the 
> flink-connector-cassandra jar.
> Now the actual build-process for Flink happens and builds another shaded JAR 
> and creates the flink-dist.jar. Inside this JAR, there is also a shaded 
> version of guava inside org.apache.flink.shaded.
> Now the issue: The guava version which is in the flink-dist.jar is not 
> compatible and doesn't contain the Futures.withFallback which the datastax 
> driver is using.
> This leads into the following issue: You can without any problems launch a 
> flink task which uses the casandra driver locally (so through the 
> mini-cluster) because that is never using the flink-dist.jar. 
> BUT: As soon as you are trying to start this job on a flink cluster (which 
> uses the flink-dist.jar), the job breaks with the following exception:
> https://gist.github.com/theomega/5ab9b14ffb516b15814de28e499b040d
> You can inspect this by opening the 
> flink-connector-cassandra_2.11-1.1-SNAPSHOT.jar and the 
> flink-dist_2.11-1.1-SNAPSHOT.jar in a java decompiler.
> I don't know a good solution here: Perhaps it would be one solution to shade 
> the guava for the cassandra-driver somewhere else than at 
> org.apache.flink.shaded.
> [1]: 
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#withFallback(com.google.common.util.concurrent.ListenableFuture,
>  com.google.common.util.concurrent.FutureFallback, 
> java.util.concurrent.Executor)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1003) Spread out scheduling strategy

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337744#comment-15337744
 ] 

ASF GitHub Bot commented on FLINK-1003:
---

GitHub user gyfora opened a pull request:

https://github.com/apache/flink/pull/2129

[FLINK-1003] [WIP] Spread out scheduling of tasks

This is a working progress PR with the core functionality implemented but 
no tests yet.

As this is a highly critical part of the system I would like to get some 
initial feedback before proceeding to write / change a huge amount of tests :)

About the functionality:

This is an adaptation of https://github.com/apache/flink/pull/60 to the 
current flink scheduler. Instead of preferring local instances when scheduling 
new task slots the new scheduling strategy allows users to balance the load on 
the different task managers.

Every time a new task needs to be scheduled the scheduler considers all 
instances that satisfy the scheduling constraints (has available nodes + 
locality constraints) and picks the one with the smallest load. Load is 
calculated by the percentage of task slots occupied in a given task manager.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gyfora/flink scheduling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2129


commit f895fd71f392482cf0a50e32dc637f7885995c4e
Author: Gyula Fora 
Date:   2016-06-18T10:19:13Z

[FLINK-1003] Spread out scheduling of tasks




> Spread out scheduling strategy
> --
>
> Key: FLINK-1003
> URL: https://issues.apache.org/jira/browse/FLINK-1003
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Gyula Fora
>
> Currently the Flink scheduler tries to fill one instance completely before 
> the tasks are deployed to another instance. This is a good behaviour in 
> multi-user and multi-job scenarios but it wastes resources if one wants to 
> use the complete cluster. Therefore, another scheduling strategy where the 
> load among the different instances is kept balanced might be useful. This 
> spread out strategy will deploy the tasks such that the overall work is 
> equally distributed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2129: [FLINK-1003] [WIP] Spread out scheduling of tasks

2016-06-18 Thread gyfora
GitHub user gyfora opened a pull request:

https://github.com/apache/flink/pull/2129

[FLINK-1003] [WIP] Spread out scheduling of tasks

This is a working progress PR with the core functionality implemented but 
no tests yet.

As this is a highly critical part of the system I would like to get some 
initial feedback before proceeding to write / change a huge amount of tests :)

About the functionality:

This is an adaptation of https://github.com/apache/flink/pull/60 to the 
current flink scheduler. Instead of preferring local instances when scheduling 
new task slots the new scheduling strategy allows users to balance the load on 
the different task managers.

Every time a new task needs to be scheduled the scheduler considers all 
instances that satisfy the scheduling constraints (has available nodes + 
locality constraints) and picks the one with the smallest load. Load is 
calculated by the percentage of task slots occupied in a given task manager.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gyfora/flink scheduling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2129


commit f895fd71f392482cf0a50e32dc637f7885995c4e
Author: Gyula Fora 
Date:   2016-06-18T10:19:13Z

[FLINK-1003] Spread out scheduling of tasks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-1003) Spread out scheduling strategy

2016-06-18 Thread Gyula Fora (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-1003:
-

Assignee: Gyula Fora

> Spread out scheduling strategy
> --
>
> Key: FLINK-1003
> URL: https://issues.apache.org/jira/browse/FLINK-1003
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Gyula Fora
>
> Currently the Flink scheduler tries to fill one instance completely before 
> the tasks are deployed to another instance. This is a good behaviour in 
> multi-user and multi-job scenarios but it wastes resources if one wants to 
> use the complete cluster. Therefore, another scheduling strategy where the 
> load among the different instances is kept balanced might be useful. This 
> spread out strategy will deploy the tasks such that the overall work is 
> equally distributed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-06-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337682#comment-15337682
 ] 

ASF GitHub Bot commented on FLINK-3677:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2109
  
yes i think so.

I have 2 more issues with this PR.

For one the documentation isn't in the correct place i believe. The 
config.md file specifies configuration parameters for Flink as a system. These 
parameters are set in the flink-conf.yaml, and are not passed to the 
InputFormats configure() method. I believe the keys belong into the DataSource 
section of the Batch Guide, as users have to set the keys using the 
DataSource#withParameters() method.

Which brings me to the second issue: The configure() method is a somewhat 
antiquated way for users to configure IO-Formats. We now usually use additional 
arguments in the readTextFile() methods, or additional configuration methods 
like in readCsvFile(). That said, we can't modify the methods as the ExEnv is 
@Public, and we can't add another 3-4 variants of readTextFile. We also can't 
change the return type of readTextFile from DataSource to a more useful 
TextReader, again because it's @Public.

I'm myself not sure whether it really is an issue, and if so how to resolve 
it Thus I would like others to weigh in on this one before proceeding further.


> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Ivan Mushketyk
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...

2016-06-18 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2109
  
yes i think so.

I have 2 more issues with this PR.

For one the documentation isn't in the correct place i believe. The 
config.md file specifies configuration parameters for Flink as a system. These 
parameters are set in the flink-conf.yaml, and are not passed to the 
InputFormats configure() method. I believe the keys belong into the DataSource 
section of the Batch Guide, as users have to set the keys using the 
DataSource#withParameters() method.

Which brings me to the second issue: The configure() method is a somewhat 
antiquated way for users to configure IO-Formats. We now usually use additional 
arguments in the readTextFile() methods, or additional configuration methods 
like in readCsvFile(). That said, we can't modify the methods as the ExEnv is 
@Public, and we can't add another 3-4 variants of readTextFile. We also can't 
change the return type of readTextFile from DataSource to a more useful 
TextReader, again because it's @Public.

I'm myself not sure whether it really is an issue, and if so how to resolve 
it Thus I would like others to weigh in on this one before proceeding further.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4091) flink-connector-cassandra has conflicting guava version

2016-06-18 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337620#comment-15337620
 ] 

Chesnay Schepler commented on FLINK-4091:
-

thank your for the clarification. is there any chance you could upload the 
complete TaskManager log file? that might shed some light on the issue.

> flink-connector-cassandra has conflicting guava version
> ---
>
> Key: FLINK-4091
> URL: https://issues.apache.org/jira/browse/FLINK-4091
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
> Environment: MacOSX, 1.10-SNAPSHOT (head is 
> 1a6bab3ef76805685044cf4521e32315169f9033)
>Reporter: Dominik Bruhn
>
> The newly merged cassandra streaming connector has an issue with its guava 
> dependency.
> The build-process for flink-connector-cassandra creates shaded JAR file which 
> contains the connector, the datastax cassandra driver plus in 
> org.apache.flink.shaded a shaded copy of guava. 
> The datastax cassandra driver calls into Futures.withFallback ([1]) which is 
> present in this guava version. This also works inside the 
> flink-connector-cassandra jar.
> Now the actual build-process for Flink happens and builds another shaded JAR 
> and creates the flink-dist.jar. Inside this JAR, there is also a shaded 
> version of guava inside org.apache.flink.shaded.
> Now the issue: The guava version which is in the flink-dist.jar is not 
> compatible and doesn't contain the Futures.withFallback which the datastax 
> driver is using.
> This leads into the following issue: You can without any problems launch a 
> flink task which uses the casandra driver locally (so through the 
> mini-cluster) because that is never using the flink-dist.jar. 
> BUT: As soon as you are trying to start this job on a flink cluster (which 
> uses the flink-dist.jar), the job breaks with the following exception:
> https://gist.github.com/theomega/5ab9b14ffb516b15814de28e499b040d
> You can inspect this by opening the 
> flink-connector-cassandra_2.11-1.1-SNAPSHOT.jar and the 
> flink-dist_2.11-1.1-SNAPSHOT.jar in a java decompiler.
> I don't know a good solution here: Perhaps it would be one solution to shade 
> the guava for the cassandra-driver somewhere else than at 
> org.apache.flink.shaded.
> [1]: 
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#withFallback(com.google.common.util.concurrent.ListenableFuture,
>  com.google.common.util.concurrent.FutureFallback, 
> java.util.concurrent.Executor)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)