[jira] [Closed] (FLINK-3977) Subclasses of InternalWindowFunction must support OutputTypeConfigurable
[ https://issues.apache.org/jira/browse/FLINK-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-3977. Resolution: Fixed Fix Version/s: 1.1.0 Fixed with 18744b2c846aa51ed317c4c7409519f25e3eafb7 > Subclasses of InternalWindowFunction must support OutputTypeConfigurable > > > Key: FLINK-3977 > URL: https://issues.apache.org/jira/browse/FLINK-3977 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Fabian Hueske >Priority: Critical > Fix For: 1.1.0 > > > Right now, if they wrap functions and a wrapped function implements > {{OutputTypeConfigurable}}, {{setOutputType}} is never called. This manifests > itself, for example, in FoldFunction on a window with evictor not working. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3641) Document registerCachedFile API call
[ https://issues.apache.org/jira/browse/FLINK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-3641. Resolution: Done Fix Version/s: 1.1.0 Done with ba62df14a52660cec85783b1070821acd144fd06 > Document registerCachedFile API call > > > Key: FLINK-3641 > URL: https://issues.apache.org/jira/browse/FLINK-3641 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Fabian Hueske >Priority: Minor > Fix For: 1.1.0 > > > Flink's stable API supports the {{registerCachedFile}} API call at the > {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online > documentation. Furthermore, the {{DistributedCache}} is also not explained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3949) Collect Metrics in Runtime Operators
[ https://issues.apache.org/jira/browse/FLINK-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338209#comment-15338209 ] ASF GitHub Bot commented on FLINK-3949: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2119 > Collect Metrics in Runtime Operators > > > Key: FLINK-3949 > URL: https://issues.apache.org/jira/browse/FLINK-3949 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime >Reporter: Stephan Ewen >Assignee: Chesnay Schepler > Fix For: 1.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3641) Document registerCachedFile API call
[ https://issues.apache.org/jira/browse/FLINK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338207#comment-15338207 ] ASF GitHub Bot commented on FLINK-3641: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2122 > Document registerCachedFile API call > > > Key: FLINK-3641 > URL: https://issues.apache.org/jira/browse/FLINK-3641 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Fabian Hueske >Priority: Minor > > Flink's stable API supports the {{registerCachedFile}} API call at the > {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online > documentation. Furthermore, the {{DistributedCache}} is also not explained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3977) Subclasses of InternalWindowFunction must support OutputTypeConfigurable
[ https://issues.apache.org/jira/browse/FLINK-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338208#comment-15338208 ] ASF GitHub Bot commented on FLINK-3977: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2118 > Subclasses of InternalWindowFunction must support OutputTypeConfigurable > > > Key: FLINK-3977 > URL: https://issues.apache.org/jira/browse/FLINK-3977 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Fabian Hueske >Priority: Critical > > Right now, if they wrap functions and a wrapped function implements > {{OutputTypeConfigurable}}, {{setOutputType}} is never called. This manifests > itself, for example, in FoldFunction on a window with evictor not working. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2118: [FLINK-3977] InternalWindowFunctions implement Out...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2118 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2122: [FLINK-3641] Add documentation for DataSet distrib...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2122 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2119: [FLINK-3949] Add numSplitsProcessed (Streaming)
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2119 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3641) Document registerCachedFile API call
[ https://issues.apache.org/jira/browse/FLINK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338205#comment-15338205 ] ASF GitHub Bot commented on FLINK-3641: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2122 Merging > Document registerCachedFile API call > > > Key: FLINK-3641 > URL: https://issues.apache.org/jira/browse/FLINK-3641 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Fabian Hueske >Priority: Minor > > Flink's stable API supports the {{registerCachedFile}} API call at the > {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online > documentation. Furthermore, the {{DistributedCache}} is also not explained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2122: [FLINK-3641] Add documentation for DataSet distributed ca...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2122 Merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3859) Add BigDecimal/BigInteger support to Table API
[ https://issues.apache.org/jira/browse/FLINK-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338204#comment-15338204 ] ASF GitHub Bot commented on FLINK-3859: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2088 Won't merge this PR at this point. A few Travis builds timed out and I found that the added expression tests increase build time by about one minute due to the code-gen compilation overhead. @twalthr, do you think we can adapt the tests to batch several expressions into a single class with multiple methods to invoke the compiler less frequently? > Add BigDecimal/BigInteger support to Table API > -- > > Key: FLINK-3859 > URL: https://issues.apache.org/jira/browse/FLINK-3859 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Critical > > Since FLINK-3786 has been solved, we can now start integrating > BigDecimal/BigInteger into the Table API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2088: [FLINK-3859] [table] Add BigDecimal/BigInteger support to...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2088 Won't merge this PR at this point. A few Travis builds timed out and I found that the added expression tests increase build time by about one minute due to the code-gen compilation overhead. @twalthr, do you think we can adapt the tests to batch several expressions into a single class with multiple methods to invoke the compiler less frequently? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2618) ExternalSortITCase failure
[ https://issues.apache.org/jira/browse/FLINK-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338186#comment-15338186 ] Kostas Kloudas commented on FLINK-2618: --- And another one: https://s3.amazonaws.com/archive.travis-ci.org/jobs/138556355/log.txt > ExternalSortITCase failure > -- > > Key: FLINK-2618 > URL: https://issues.apache.org/jira/browse/FLINK-2618 > Project: Flink > Issue Type: Bug >Reporter: Sachin Goel > Labels: test-stability > > {{ExternalSortITCase.testSpillingSortWithIntermediateMerge}} fails with the > exception: > {code} > org.apache.flink.types.NullKeyFieldException: Field 0 is null, but expected > to hold a key. > at > org.apache.flink.api.common.typeutils.record.RecordComparator.setReference(RecordComparator.java:212) > at > org.apache.flink.api.common.typeutils.record.RecordComparator.setReference(RecordComparator.java:40) > at > org.apache.flink.runtime.operators.sort.MergeIterator$HeadStream.nextHead(MergeIterator.java:127) > at > org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:88) > at > org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:69) > at > org.apache.flink.runtime.operators.sort.ExternalSortITCase.testSpillingSortWithIntermediateMerge(ExternalSortITCase.java:301) > {code} > Here is the build log: > https://s3.amazonaws.com/archive.travis-ci.org/jobs/78579607/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns
[ https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338100#comment-15338100 ] ASF GitHub Bot commented on FLINK-3677: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/2109 There are a number of unstable tests that we are aware of. Rebasing to the latest master can reduce this issues, but they aren't completely resolved yet. > FileInputFormat: Allow to specify include/exclude file name patterns > > > Key: FLINK-3677 > URL: https://issues.apache.org/jira/browse/FLINK-3677 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Maximilian Michels >Assignee: Ivan Mushketyk >Priority: Minor > Labels: starter > > It would be nice to be able to specify a regular expression to filter files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2109 There are a number of unstable tests that we are aware of. Rebasing to the latest master can reduce this issues, but they aren't completely resolved yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns
[ https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338077#comment-15338077 ] ASF GitHub Bot commented on FLINK-3677: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2109 By the way, build in CI seems to fail pretty much randomly. Do I need to rebase my changes to a later revision to solve this? > FileInputFormat: Allow to specify include/exclude file name patterns > > > Key: FLINK-3677 > URL: https://issues.apache.org/jira/browse/FLINK-3677 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Maximilian Michels >Assignee: Ivan Mushketyk >Priority: Minor > Labels: starter > > It would be nice to be able to specify a regular expression to filter files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...
Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2109 By the way, build in CI seems to fail pretty much randomly. Do I need to rebase my changes to a later revision to solve this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns
[ https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338076#comment-15338076 ] ASF GitHub Bot commented on FLINK-3677: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2109 Ok, I'll update my PR when others will comment on it. > FileInputFormat: Allow to specify include/exclude file name patterns > > > Key: FLINK-3677 > URL: https://issues.apache.org/jira/browse/FLINK-3677 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Maximilian Michels >Assignee: Ivan Mushketyk >Priority: Minor > Labels: starter > > It would be nice to be able to specify a regular expression to filter files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...
Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2109 Ok, I'll update my PR when others will comment on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4091) flink-connector-cassandra has conflicting guava version
[ https://issues.apache.org/jira/browse/FLINK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337971#comment-15337971 ] Chesnay Schepler commented on FLINK-4091: - If i recall correctly that's just what we did, but it would be interesting to see what would happen if the cassandra jar is available under /lib and not contained in the far-jar. Shading it to a different namespace would be the easiest solution, but I'm still curious why this popped up now. > flink-connector-cassandra has conflicting guava version > --- > > Key: FLINK-4091 > URL: https://issues.apache.org/jira/browse/FLINK-4091 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.0 > Environment: MacOSX, 1.10-SNAPSHOT (head is > 1a6bab3ef76805685044cf4521e32315169f9033) >Reporter: Dominik Bruhn > > The newly merged cassandra streaming connector has an issue with its guava > dependency. > The build-process for flink-connector-cassandra creates shaded JAR file which > contains the connector, the datastax cassandra driver plus in > org.apache.flink.shaded a shaded copy of guava. > The datastax cassandra driver calls into Futures.withFallback ([1]) which is > present in this guava version. This also works inside the > flink-connector-cassandra jar. > Now the actual build-process for Flink happens and builds another shaded JAR > and creates the flink-dist.jar. Inside this JAR, there is also a shaded > version of guava inside org.apache.flink.shaded. > Now the issue: The guava version which is in the flink-dist.jar is not > compatible and doesn't contain the Futures.withFallback which the datastax > driver is using. > This leads into the following issue: You can without any problems launch a > flink task which uses the casandra driver locally (so through the > mini-cluster) because that is never using the flink-dist.jar. > BUT: As soon as you are trying to start this job on a flink cluster (which > uses the flink-dist.jar), the job breaks with the following exception: > https://gist.github.com/theomega/5ab9b14ffb516b15814de28e499b040d > You can inspect this by opening the > flink-connector-cassandra_2.11-1.1-SNAPSHOT.jar and the > flink-dist_2.11-1.1-SNAPSHOT.jar in a java decompiler. > I don't know a good solution here: Perhaps it would be one solution to shade > the guava for the cassandra-driver somewhere else than at > org.apache.flink.shaded. > [1]: > https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#withFallback(com.google.common.util.concurrent.ListenableFuture, > com.google.common.util.concurrent.FutureFallback, > java.util.concurrent.Executor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3801) Upgrade Joda-Time library to 2.9.3
[ https://issues.apache.org/jira/browse/FLINK-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-3801: -- Description: Currently yoda-time 2.5 is used which was very old. We should upgrade to 2.9.3 was: Currently yoda-time 2.5 is used which was very old. We should upgrade to 2.9.3 > Upgrade Joda-Time library to 2.9.3 > -- > > Key: FLINK-3801 > URL: https://issues.apache.org/jira/browse/FLINK-3801 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > Currently yoda-time 2.5 is used which was very old. > We should upgrade to 2.9.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #1517: [FLINK-3477] [runtime] Add hash-based combine strategy fo...
Github user ggevay commented on the issue: https://github.com/apache/flink/pull/1517 I have rebased to the current master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction
[ https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337835#comment-15337835 ] ASF GitHub Bot commented on FLINK-3477: --- Github user ggevay commented on the issue: https://github.com/apache/flink/pull/1517 I have rebased to the current master. > Add hash-based combine strategy for ReduceFunction > -- > > Key: FLINK-3477 > URL: https://issues.apache.org/jira/browse/FLINK-3477 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime >Reporter: Fabian Hueske >Assignee: Gabor Gevay > > This issue is about adding a hash-based combine strategy for ReduceFunctions. > The interface of the {{reduce()}} method is as follows: > {code} > public T reduce(T v1, T v2) > {code} > Input type and output type are identical and the function returns only a > single value. A Reduce function is incrementally applied to compute a final > aggregated value. This allows to hold the preaggregated value in a hash-table > and update it with each function call. > The hash-based strategy requires special implementation of an in-memory hash > table. The hash table should support in place updates of elements (if the > updated value has the same size as the new value) but also appending updates > with invalidation of the old value (if the binary length of the new value > differs). The hash table needs to be able to evict and emit all elements if > it runs out-of-memory. > We should also add {{HASH}} and {{SORT}} compiler hints to > {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the > execution strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4091) flink-connector-cassandra has conflicting guava version
[ https://issues.apache.org/jira/browse/FLINK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337750#comment-15337750 ] Dominik Bruhn commented on FLINK-4091: -- Here is the stdout of the run command: https://gist.github.com/theomega/4d1a1b83de1bbf715dd60d17b0143714 And here is the log: https://gist.github.com/theomega/33f7f5b8d772d17c9c071614c4a73525 Could it be related to how I package the job? Currently I'm integrating the flink-connector-cassandra into my fat job jar. I didn't find any documentation if that is the right approach. > flink-connector-cassandra has conflicting guava version > --- > > Key: FLINK-4091 > URL: https://issues.apache.org/jira/browse/FLINK-4091 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.0 > Environment: MacOSX, 1.10-SNAPSHOT (head is > 1a6bab3ef76805685044cf4521e32315169f9033) >Reporter: Dominik Bruhn > > The newly merged cassandra streaming connector has an issue with its guava > dependency. > The build-process for flink-connector-cassandra creates shaded JAR file which > contains the connector, the datastax cassandra driver plus in > org.apache.flink.shaded a shaded copy of guava. > The datastax cassandra driver calls into Futures.withFallback ([1]) which is > present in this guava version. This also works inside the > flink-connector-cassandra jar. > Now the actual build-process for Flink happens and builds another shaded JAR > and creates the flink-dist.jar. Inside this JAR, there is also a shaded > version of guava inside org.apache.flink.shaded. > Now the issue: The guava version which is in the flink-dist.jar is not > compatible and doesn't contain the Futures.withFallback which the datastax > driver is using. > This leads into the following issue: You can without any problems launch a > flink task which uses the casandra driver locally (so through the > mini-cluster) because that is never using the flink-dist.jar. > BUT: As soon as you are trying to start this job on a flink cluster (which > uses the flink-dist.jar), the job breaks with the following exception: > https://gist.github.com/theomega/5ab9b14ffb516b15814de28e499b040d > You can inspect this by opening the > flink-connector-cassandra_2.11-1.1-SNAPSHOT.jar and the > flink-dist_2.11-1.1-SNAPSHOT.jar in a java decompiler. > I don't know a good solution here: Perhaps it would be one solution to shade > the guava for the cassandra-driver somewhere else than at > org.apache.flink.shaded. > [1]: > https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#withFallback(com.google.common.util.concurrent.ListenableFuture, > com.google.common.util.concurrent.FutureFallback, > java.util.concurrent.Executor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1003) Spread out scheduling strategy
[ https://issues.apache.org/jira/browse/FLINK-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337744#comment-15337744 ] ASF GitHub Bot commented on FLINK-1003: --- GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/2129 [FLINK-1003] [WIP] Spread out scheduling of tasks This is a working progress PR with the core functionality implemented but no tests yet. As this is a highly critical part of the system I would like to get some initial feedback before proceeding to write / change a huge amount of tests :) About the functionality: This is an adaptation of https://github.com/apache/flink/pull/60 to the current flink scheduler. Instead of preferring local instances when scheduling new task slots the new scheduling strategy allows users to balance the load on the different task managers. Every time a new task needs to be scheduled the scheduler considers all instances that satisfy the scheduling constraints (has available nodes + locality constraints) and picks the one with the smallest load. Load is calculated by the percentage of task slots occupied in a given task manager. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink scheduling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2129 commit f895fd71f392482cf0a50e32dc637f7885995c4e Author: Gyula ForaDate: 2016-06-18T10:19:13Z [FLINK-1003] Spread out scheduling of tasks > Spread out scheduling strategy > -- > > Key: FLINK-1003 > URL: https://issues.apache.org/jira/browse/FLINK-1003 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Gyula Fora > > Currently the Flink scheduler tries to fill one instance completely before > the tasks are deployed to another instance. This is a good behaviour in > multi-user and multi-job scenarios but it wastes resources if one wants to > use the complete cluster. Therefore, another scheduling strategy where the > load among the different instances is kept balanced might be useful. This > spread out strategy will deploy the tasks such that the overall work is > equally distributed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2129: [FLINK-1003] [WIP] Spread out scheduling of tasks
GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/2129 [FLINK-1003] [WIP] Spread out scheduling of tasks This is a working progress PR with the core functionality implemented but no tests yet. As this is a highly critical part of the system I would like to get some initial feedback before proceeding to write / change a huge amount of tests :) About the functionality: This is an adaptation of https://github.com/apache/flink/pull/60 to the current flink scheduler. Instead of preferring local instances when scheduling new task slots the new scheduling strategy allows users to balance the load on the different task managers. Every time a new task needs to be scheduled the scheduler considers all instances that satisfy the scheduling constraints (has available nodes + locality constraints) and picks the one with the smallest load. Load is calculated by the percentage of task slots occupied in a given task manager. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink scheduling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2129 commit f895fd71f392482cf0a50e32dc637f7885995c4e Author: Gyula ForaDate: 2016-06-18T10:19:13Z [FLINK-1003] Spread out scheduling of tasks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1003) Spread out scheduling strategy
[ https://issues.apache.org/jira/browse/FLINK-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-1003: - Assignee: Gyula Fora > Spread out scheduling strategy > -- > > Key: FLINK-1003 > URL: https://issues.apache.org/jira/browse/FLINK-1003 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Gyula Fora > > Currently the Flink scheduler tries to fill one instance completely before > the tasks are deployed to another instance. This is a good behaviour in > multi-user and multi-job scenarios but it wastes resources if one wants to > use the complete cluster. Therefore, another scheduling strategy where the > load among the different instances is kept balanced might be useful. This > spread out strategy will deploy the tasks such that the overall work is > equally distributed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns
[ https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337682#comment-15337682 ] ASF GitHub Bot commented on FLINK-3677: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/2109 yes i think so. I have 2 more issues with this PR. For one the documentation isn't in the correct place i believe. The config.md file specifies configuration parameters for Flink as a system. These parameters are set in the flink-conf.yaml, and are not passed to the InputFormats configure() method. I believe the keys belong into the DataSource section of the Batch Guide, as users have to set the keys using the DataSource#withParameters() method. Which brings me to the second issue: The configure() method is a somewhat antiquated way for users to configure IO-Formats. We now usually use additional arguments in the readTextFile() methods, or additional configuration methods like in readCsvFile(). That said, we can't modify the methods as the ExEnv is @Public, and we can't add another 3-4 variants of readTextFile. We also can't change the return type of readTextFile from DataSource to a more useful TextReader, again because it's @Public. I'm myself not sure whether it really is an issue, and if so how to resolve it Thus I would like others to weigh in on this one before proceeding further. > FileInputFormat: Allow to specify include/exclude file name patterns > > > Key: FLINK-3677 > URL: https://issues.apache.org/jira/browse/FLINK-3677 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Maximilian Michels >Assignee: Ivan Mushketyk >Priority: Minor > Labels: starter > > It would be nice to be able to specify a regular expression to filter files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2109 yes i think so. I have 2 more issues with this PR. For one the documentation isn't in the correct place i believe. The config.md file specifies configuration parameters for Flink as a system. These parameters are set in the flink-conf.yaml, and are not passed to the InputFormats configure() method. I believe the keys belong into the DataSource section of the Batch Guide, as users have to set the keys using the DataSource#withParameters() method. Which brings me to the second issue: The configure() method is a somewhat antiquated way for users to configure IO-Formats. We now usually use additional arguments in the readTextFile() methods, or additional configuration methods like in readCsvFile(). That said, we can't modify the methods as the ExEnv is @Public, and we can't add another 3-4 variants of readTextFile. We also can't change the return type of readTextFile from DataSource to a more useful TextReader, again because it's @Public. I'm myself not sure whether it really is an issue, and if so how to resolve it Thus I would like others to weigh in on this one before proceeding further. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4091) flink-connector-cassandra has conflicting guava version
[ https://issues.apache.org/jira/browse/FLINK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337620#comment-15337620 ] Chesnay Schepler commented on FLINK-4091: - thank your for the clarification. is there any chance you could upload the complete TaskManager log file? that might shed some light on the issue. > flink-connector-cassandra has conflicting guava version > --- > > Key: FLINK-4091 > URL: https://issues.apache.org/jira/browse/FLINK-4091 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.0 > Environment: MacOSX, 1.10-SNAPSHOT (head is > 1a6bab3ef76805685044cf4521e32315169f9033) >Reporter: Dominik Bruhn > > The newly merged cassandra streaming connector has an issue with its guava > dependency. > The build-process for flink-connector-cassandra creates shaded JAR file which > contains the connector, the datastax cassandra driver plus in > org.apache.flink.shaded a shaded copy of guava. > The datastax cassandra driver calls into Futures.withFallback ([1]) which is > present in this guava version. This also works inside the > flink-connector-cassandra jar. > Now the actual build-process for Flink happens and builds another shaded JAR > and creates the flink-dist.jar. Inside this JAR, there is also a shaded > version of guava inside org.apache.flink.shaded. > Now the issue: The guava version which is in the flink-dist.jar is not > compatible and doesn't contain the Futures.withFallback which the datastax > driver is using. > This leads into the following issue: You can without any problems launch a > flink task which uses the casandra driver locally (so through the > mini-cluster) because that is never using the flink-dist.jar. > BUT: As soon as you are trying to start this job on a flink cluster (which > uses the flink-dist.jar), the job breaks with the following exception: > https://gist.github.com/theomega/5ab9b14ffb516b15814de28e499b040d > You can inspect this by opening the > flink-connector-cassandra_2.11-1.1-SNAPSHOT.jar and the > flink-dist_2.11-1.1-SNAPSHOT.jar in a java decompiler. > I don't know a good solution here: Perhaps it would be one solution to shade > the guava for the cassandra-driver somewhere else than at > org.apache.flink.shaded. > [1]: > https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Futures.html#withFallback(com.google.common.util.concurrent.ListenableFuture, > com.google.common.util.concurrent.FutureFallback, > java.util.concurrent.Executor) -- This message was sent by Atlassian JIRA (v6.3.4#6332)