[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568721#comment-14568721 ] Maximilian Michels commented on FLINK-2127: --- [~uce] Actually I cannot because I have no access to the Apache infrastructure. I could ask one of the infra people though. I managed to reproduce and solve the problem locally. Actually, this is an issue with the manually inserted line break {{br}}. Those shouldn't be used anyways. Rebuilding our docs at the moment to check if the issue has been resolved. The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568786#comment-14568786 ] Aljoscha Krettek commented on FLINK-2069: - Yes, you are right but I think in the `CsvOutputFormat` the buffer is just used because all the fields and field separates are written to the output stream separately. It still doesn't fix the bug that [~fobeligi] reported, however. writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107875574 While merging realized, that it is missing from the windowed and connected datastreams and the from the scala API. Also adding those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568664#comment-14568664 ] ASF GitHub Bot commented on FLINK-1731: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-107831123 Hey guys. You might wanna look at the initialization schemes here: https://github.com/apache/flink/pull/757 Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2132) Java version parsing is not working for OpenJDK
[ https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568805#comment-14568805 ] Ufuk Celebi commented on FLINK-2132: Openjdk 7 version string looks as expected: {code} java version 1.7.0_75 OpenJDK Runtime Environment (IcedTea 2.5.4) (Arch Linux build 7.u75_2.5.4-1-x86_64) OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode) {code} Java version parsing is not working for OpenJDK --- Key: FLINK-2132 URL: https://issues.apache.org/jira/browse/FLINK-2132 Project: Flink Issue Type: Bug Components: Start-Stop Scripts Affects Versions: master Reporter: Ufuk Celebi Priority: Critical Reported by [~aljoscha]: {code} /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk version 1.8.0_40-internal: integer expression expected {code} On Ubuntu 14.10 with openjdk 8. The script expects a String of format {code} java version 1.8.0_20 Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) {code} but the openjdk string looks as follows {code} openjdk version 1.8.0_40-internal OpenJDK Runtime Environment (build 1.8.0_40-internal-b09) OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2132) Java version parsing is not working for OpenJDK
[ https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reassigned FLINK-2132: -- Assignee: Ufuk Celebi Java version parsing is not working for OpenJDK --- Key: FLINK-2132 URL: https://issues.apache.org/jira/browse/FLINK-2132 Project: Flink Issue Type: Bug Components: Start-Stop Scripts Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Critical Reported by [~aljoscha]: {code} /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk version 1.8.0_40-internal: integer expression expected {code} On Ubuntu 14.10 with openjdk 8. The script expects a String of format {code} java version 1.8.0_20 Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) {code} but the openjdk string looks as follows {code} openjdk version 1.8.0_40-internal OpenJDK Runtime Environment (build 1.8.0_40-internal-b09) OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107853134 Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2080) Execute Flink with sbt
[ https://issues.apache.org/jira/browse/FLINK-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-2080: -- Component/s: Documentation Execute Flink with sbt -- Key: FLINK-2080 URL: https://issues.apache.org/jira/browse/FLINK-2080 Project: Flink Issue Type: Improvement Components: docs, Documentation Affects Versions: 0.8.1 Reporter: Christian Wuertz Priority: Minor I tried to execute some of the flink example applications on my local machine using sbt. To get this running without class loading issues it was important to make sure that Flink is executed in its own JVM and not in the sbt JVM. This can be done very easily, but it would have been nice to know that in advance. So maybe you guys want to add this to the Flink documentation. An example can be found here: https://github.com/Teots/flink-sbt (The trick was to add fork in run := true to the build.sbt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2102] [ml] Add predict operation for La...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/744#issuecomment-107897490 LGTM. I will merge it as a temporary solution for the manual evaluation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-107831123 Hey guys. You might wanna look at the initialization schemes here: https://github.com/apache/flink/pull/757 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2132) Java version parsing is not working for OpenJDK
[ https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2132: --- Fix Version/s: 0.9 Java version parsing is not working for OpenJDK --- Key: FLINK-2132 URL: https://issues.apache.org/jira/browse/FLINK-2132 Project: Flink Issue Type: Bug Components: Start-Stop Scripts Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Critical Fix For: 0.9 Reported by [~aljoscha]: {code} /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk version 1.8.0_40-internal: integer expression expected {code} On Ubuntu 14.10 with openjdk 8. The script expects a String of format {code} java version 1.8.0_20 Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) {code} but the openjdk string looks as follows {code} openjdk version 1.8.0_40-internal OpenJDK Runtime Environment (build 1.8.0_40-internal-b09) OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API
[ https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568754#comment-14568754 ] ASF GitHub Bot commented on FLINK-2103: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107854661 Good point, will do that anyway because of the renaming discussed on the mailing list. Expose partitionBy to the user in Stream API Key: FLINK-2103 URL: https://issues.apache.org/jira/browse/FLINK-2103 Project: Flink Issue Type: Improvement Affects Versions: 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Is there a reason why this is not exposed to the user? I could see cases where this would be useful to have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107854661 Good point, will do that anyway because of the renaming discussed on the mailing list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2132) Java version parsing is not working for OpenJDK
Ufuk Celebi created FLINK-2132: -- Summary: Java version parsing is not working for OpenJDK Key: FLINK-2132 URL: https://issues.apache.org/jira/browse/FLINK-2132 Project: Flink Issue Type: Bug Components: Start-Stop Scripts Affects Versions: master Reporter: Ufuk Celebi Priority: Critical Reported by [~aljoscha]: {code} /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk version 1.8.0_40-internal: integer expression expected {code} On Ubuntu 14.10 with openjdk 8. The script expects a String of format {code} java version 1.8.0_20 Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) {code} but the openjdk string looks as follows {code} openjdk version 1.8.0_40-internal OpenJDK Runtime Environment (build 1.8.0_40-internal-b09) OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2102) Add predict operation for LabeledVector
[ https://issues.apache.org/jira/browse/FLINK-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568850#comment-14568850 ] ASF GitHub Bot commented on FLINK-2102: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/744#issuecomment-107897490 LGTM. I will merge it as a temporary solution for the manual evaluation. Add predict operation for LabeledVector --- Key: FLINK-2102 URL: https://issues.apache.org/jira/browse/FLINK-2102 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 Currently we can only call predict on DataSet[V : Vector]. A lot of times though we have a DataSet[LabeledVector] that we split into a train and test set. We should be able to make predictions on the test DataSet[LabeledVector] without having to transform it into a DataSet[Vector] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API
[ https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568750#comment-14568750 ] ASF GitHub Bot commented on FLINK-2103: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107853775 We should first update the documentation. Expose partitionBy to the user in Stream API Key: FLINK-2103 URL: https://issues.apache.org/jira/browse/FLINK-2103 Project: Flink Issue Type: Improvement Affects Versions: 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Is there a reason why this is not exposed to the user? I could see cases where this would be useful to have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107853775 We should first update the documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1884) Outdated Spargel docs
[ https://issues.apache.org/jira/browse/FLINK-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1884: -- Component/s: Documentation Outdated Spargel docs - Key: FLINK-1884 URL: https://issues.apache.org/jira/browse/FLINK-1884 Project: Flink Issue Type: Bug Components: docs, Documentation, Spargel Reporter: Vasia Kalavri It seems like the example in the Spargel guide hasn't been updated for the past few versions. The example code uses a {{SpargelIteration}}, {{FileDataSource}}, {{FileDataSink}} and creates a {{Plan}}. We could either update the example there or, since we're going to deprecate Spargel, we could remove the example from the guide and add the correct version in the Spargel-to-Gelly migration guide (FLINK-1871). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API
[ https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568742#comment-14568742 ] ASF GitHub Bot commented on FLINK-2103: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107853134 Merging. Expose partitionBy to the user in Stream API Key: FLINK-2103 URL: https://issues.apache.org/jira/browse/FLINK-2103 Project: Flink Issue Type: Improvement Affects Versions: 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Is there a reason why this is not exposed to the user? I could see cases where this would be useful to have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API
[ https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568812#comment-14568812 ] ASF GitHub Bot commented on FLINK-2103: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/743#issuecomment-107875574 While merging realized, that it is missing from the windowed and connected datastreams and the from the scala API. Also adding those. Expose partitionBy to the user in Stream API Key: FLINK-2103 URL: https://issues.apache.org/jira/browse/FLINK-2103 Project: Flink Issue Type: Improvement Affects Versions: 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Is there a reason why this is not exposed to the user? I could see cases where this would be useful to have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2132) Java version parsing is not working for OpenJDK
[ https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568825#comment-14568825 ] Matthias J. Sax commented on FLINK-2132: Is anyone working on this? I could take care of it. Java version parsing is not working for OpenJDK --- Key: FLINK-2132 URL: https://issues.apache.org/jira/browse/FLINK-2132 Project: Flink Issue Type: Bug Components: Start-Stop Scripts Affects Versions: master Reporter: Ufuk Celebi Priority: Critical Reported by [~aljoscha]: {code} /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk version 1.8.0_40-internal: integer expression expected {code} On Ubuntu 14.10 with openjdk 8. The script expects a String of format {code} java version 1.8.0_20 Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) {code} but the openjdk string looks as follows {code} openjdk version 1.8.0_40-internal OpenJDK Runtime Environment (build 1.8.0_40-internal-b09) OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568766#comment-14568766 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107860292 I addressed the problem with the race conditions and re-enabled the twitter source. Which exceptions are you referring to? I don't think I touched any of the exception handling or the general way that the steam tasks work. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107860292 I addressed the problem with the race conditions and re-enabled the twitter source. Which exceptions are you referring to? I don't think I touched any of the exception handling or the general way that the steam tasks work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-2127. --- Resolution: Fixed Fix Version/s: 0.9 Assignee: Maximilian Michels Fixed by removing the {{br}} tag. The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Maximilian Michels Priority: Minor Fix For: 0.9 Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568847#comment-14568847 ] Aljoscha Krettek commented on FLINK-2069: - Could you maybe share your code? I now have this example code and it still produces output: {code} DataStreamString text = env.socketTextStream(localhost, ); IterativeDataStreamTuple2String, Integer iterate = text.flatMap(new WordCount.Tokenizer()).iterate(); SplitDataStreamTuple2String, Integer iterEnd = iterate.map(new MapFunctionTuple2String, Integer, Tuple2String, Integer() { private static final long serialVersionUID = 1L; @Override public Tuple2String, Integer map(Tuple2String, Integer value) throws Exception { return new Tuple2String, Integer(value.f0, value.f1 + 1); } }).split(new OutputSelectorTuple2String, Integer() { private static final long serialVersionUID = 1L; @Override public IterableString select(Tuple2String, Integer value) { if (value.f1 10) { return Lists.newArrayList(iter); } else { return Lists.newArrayList(end); } } }); iterate.closeWith(iterEnd.select(iter)); iterEnd.select(end).writeAsCsv(/Users/aljoscha/Downloads/wc-csv-out, FileSystem.WriteMode.OVERWRITE).setParallelism(1); {code} Are you using a timeout in your {{iterate()}} call? writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31559688 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -21,10 +21,16 @@ import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; +import java.util.HashMap; import java.util.HashSet; import java.util.List; +import java.util.Map; import java.util.Set; +import org.apache.commons.lang3.Validate; --- End diff -- I'm really sorry that you ran into this, but the community recently decided to use Guava's Preconditions.check() instead of commons lang. Can you replace that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1981) Add GZip support
[ https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569601#comment-14569601 ] ASF GitHub Bot commented on FLINK-1981: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31560285 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory? inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- so if there is no inflater input stream available, it will just fall back to the compressed data stream? Wouldn't it better to at least log something or fail? Add GZip support Key: FLINK-1981 URL: https://issues.apache.org/jira/browse/FLINK-1981 Project: Flink Issue Type: New Feature Components: Core Reporter: Sebastian Kruse Assignee: Sebastian Kruse Priority: Minor GZip, as a commonly used compression format, should be supported in the same way as the already supported deflate files. This allows to use GZip files with any subclass of FileInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user sekruse commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31562256 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory? inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- It might also be the case that the stream was not compressed at all. It would of course be nice to react appropriately to a missing codec, but how would we know if the current input split belongs to an uncompressed file or a compressed file with an unknown codec? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1981) Add GZip support
[ https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569625#comment-14569625 ] ASF GitHub Bot commented on FLINK-1981: --- Github user sekruse commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31562256 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory? inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- It might also be the case that the stream was not compressed at all. It would of course be nice to react appropriately to a missing codec, but how would we know if the current input split belongs to an uncompressed file or a compressed file with an unknown codec? Add GZip support Key: FLINK-1981 URL: https://issues.apache.org/jira/browse/FLINK-1981 Project: Flink Issue Type: New Feature Components: Core Reporter: Sebastian Kruse Assignee: Sebastian Kruse Priority: Minor GZip, as a commonly used compression format, should be supported in the same way as the already supported deflate files. This allows to use GZip files with any subclass of FileInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [docs/javadoc][hotfix] Corrected Join hint and...
GitHub user andralungu opened a pull request: https://github.com/apache/flink/pull/763 [docs/javadoc][hotfix] Corrected Join hint and misleadling pointer to Spargel This PR adds the following patches: - in the Iteration programming guide, there was a misleading link to Spargel, which is deprecated; I switched it to point to Gelly; - while trying to use join hints, I saw that the JavaDoc for BROADCAST_HASH_SECOND was wrong; Hint that the *second* join input is much smaller than the *second*. Fixed it to be smaller than the first :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/andralungu/flink documPatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/763.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #763 commit 955ed6e3106d32b22933f0ce637bf38fa2faef72 Author: andralungu lungu.an...@gmail.com Date: 2015-06-02T18:58:10Z [docs/javadoc][hotfix] Corrected Join hint and misleadling pointer to Spargel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1981) Add GZip support
[ https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569638#comment-14569638 ] ASF GitHub Bot commented on FLINK-1981: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31562955 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory? inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- Ah, okay, I see. I didn't read the code closely enough. Add GZip support Key: FLINK-1981 URL: https://issues.apache.org/jira/browse/FLINK-1981 Project: Flink Issue Type: New Feature Components: Core Reporter: Sebastian Kruse Assignee: Sebastian Kruse Priority: Minor GZip, as a commonly used compression format, should be supported in the same way as the already supported deflate files. This allows to use GZip files with any subclass of FileInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1981) Add GZip support
[ https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569589#comment-14569589 ] ASF GitHub Bot commented on FLINK-1981: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31559688 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -21,10 +21,16 @@ import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; +import java.util.HashMap; import java.util.HashSet; import java.util.List; +import java.util.Map; import java.util.Set; +import org.apache.commons.lang3.Validate; --- End diff -- I'm really sorry that you ran into this, but the community recently decided to use Guava's Preconditions.check() instead of commons lang. Can you replace that? Add GZip support Key: FLINK-1981 URL: https://issues.apache.org/jira/browse/FLINK-1981 Project: Flink Issue Type: New Feature Components: Core Reporter: Sebastian Kruse Assignee: Sebastian Kruse Priority: Minor GZip, as a commonly used compression format, should be supported in the same way as the already supported deflate files. This allows to use GZip files with any subclass of FileInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31562955 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory? inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- Ah, okay, I see. I didn't read the code closely enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569675#comment-14569675 ] ASF GitHub Bot commented on FLINK-2069: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/759#issuecomment-108083643 Are there any tests for the streaming output formats? writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/759#issuecomment-108083643 Are there any tests for the streaming output formats? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569669#comment-14569669 ] ASF GitHub Bot commented on FLINK-1319: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108081964 How about extending the `UDF contains obvious errors` message with some notes on how to completely disable the SCA. I fear that the message appears (blocks the program execution) due to a bug in the SCA and then users don't know how to get their stuff to run. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108081964 How about extending the `UDF contains obvious errors` message with some notes on how to completely disable the SCA. I fear that the message appears (blocks the program execution) due to a bug in the SCA and then users don't know how to get their stuff to run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108093699 I agree with Robert, but for the initial version it won't matter as it should be disabled anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/759#issuecomment-108093780 Not enough. `writeAsText` is covered a number of times in the integration tests and ITCases, but there is no test for other output formats. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569725#comment-14569725 ] ASF GitHub Bot commented on FLINK-2069: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/759#issuecomment-108093780 Not enough. `writeAsText` is covered a number of times in the integration tests and ITCases, but there is no test for other output formats. writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569723#comment-14569723 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108093699 I agree with Robert, but for the initial version it won't matter as it should be disabled anyways. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2108) Add score function for Predictors
[ https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goel reassigned FLINK-2108: -- Assignee: Sachin Goel Add score function for Predictors - Key: FLINK-2108 URL: https://issues.apache.org/jira/browse/FLINK-2108 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Sachin Goel Priority: Minor Labels: ML A score function for Predictor implementations should take a DataSet[(I, O)] and an (optional) scoring measure and return a score. The DataSet[(I, O)] would probably be the output of the predict function. For example in MultipleLinearRegression, we can call predict on a labeled dataset, get back predictions for each item in the data, and then call score with the resulting dataset as an argument and we should get back a score for the prediction quality, such as the R^2 score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Hits
GitHub user mfahimazizi opened a pull request: https://github.com/apache/flink/pull/765 Hits HITS algorithm in Gelly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mfahimazizi/flink HITS Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #765 commit e3fe353a5a4fbca35a65672214f451cda58de6ab Author: mfahimazizi mfahimaz...@gmail.com Date: 2015-05-30T21:53:30Z HITS algorithm added commit 646660272c4c9b4152d2ebfc4bfbd6e04d891408 Author: mfahimazizi mfahimaz...@gmail.com Date: 2015-06-02T23:23:15Z HITS algorithm added_ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2138) PartitionCustom for streaming
Márton Balassi created FLINK-2138: - Summary: PartitionCustom for streaming Key: FLINK-2138 URL: https://issues.apache.org/jira/browse/FLINK-2138 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Priority: Minor The batch API has support for custom partitioning, this should be added for streaming with a similar signature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [contrib] Storm compatibility
GitHub user mbalassi opened a pull request: https://github.com/apache/flink/pull/764 [contrib] Storm compatibility This is an updated version of #573. @mjsax addressed a number of comments since then and added Readmes, @szape did a code review and added an extra interface and I have moved the codebase to flink-contrib. For the latter to be reasonable I felt necessary to break up flink-contrib to smaller submodules, despite @rmetzger advising against this approach. The names of the submodules are questionable, suggestions are welcome. I've seen test failures in storm-compatibility-core, @mjsax could you take a look please? I managed to make some error in the pom for storm-compatibility examples, so that does not build currently, will check it after I got some sleep. :) I would like to merge this to master as soon as the 0.9 branch is forked off and we can agree on it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbalassi/flink storm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/764.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #764 commit c595c265d7b40106181e16b594a34cdf06beb017 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-13T15:56:12Z [storm-compat] Introduced Storm wrappers to Flink commit 3db5335639f9d8fd0cbb1c7799af6205f8c04996 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T10:13:20Z [strom-compat] Added Storm API compatibility classes commit 5ed5bbe5fa92e967de8ad448955aec7115fcd533 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T10:26:43Z [storm-compat] Added tests for Storm compatibility API commit 9e89189ae7d0e7c48a1b9cc466e871d6919144c0 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T10:28:59Z [storm-compat] Added tests for Storm compatibility wrappers commit ebdc0d23a6502de542d269a6f0881c89057d2e0e Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T10:55:44Z [storm-compat] Added abstract base to Storm compatibility examples commit 09ac389630f231f5a793da97b993c52b0e16777b Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T10:57:46Z [storm-compat] Added Storm compatibility word count examples commit 5cf034e1d34e7ccfd745e2f7e93f2d247d3e6bbf Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T10:59:38Z [storm-compat] Added ITCases to Storm compatibility examples commit 2eec0ac002d04ecfc09bf524ef3923979c141558 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-14T11:04:01Z [storm-compat] Added README files to Storm compatibility modules commit 2d200c0d4bc16c9f8c4999d5d4fa07e527f3824c Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-21T08:15:20Z [storm-compat] Storm compatibility code cleanup commit b06ec1bc37c3d68a4e4f3568e9c37a15bc3881e8 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-05-21T08:16:58Z [storm-compat] Simple examples added commit 61ab549fd0a25a09df7b8cfcd16372b93b2cceec Author: szape nemderogator...@gmail.com Date: 2015-05-28T14:31:18Z [storm-compat] Storm compatibility layer wrappers refactor commit ed189aac5de79f1af50efcbb0a1d9defcc582d1a Author: szape nemderogator...@gmail.com Date: 2015-06-02T08:53:02Z [storm-compat] Added FiniteStormSpout interface commit 9704649267dd29291bb14f223f17a18fc2c26cd0 Author: mbalassi mbala...@apache.org Date: 2015-06-02T21:38:29Z [storm-compat] Moved Storm compatibility to flink-contrib commit 3db8724d92578f3815229c0ed2d6dafd7ddf6fd5 Author: mbalassi mbala...@apache.org Date: 2015-06-02T22:25:21Z [contrib] [build] Contrib separated to small projects --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2139) Test Streaming Outputformats
Márton Balassi created FLINK-2139: - Summary: Test Streaming Outputformats Key: FLINK-2139 URL: https://issues.apache.org/jira/browse/FLINK-2139 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Fix For: 0.9 Currently the only tested streaming core output is the writeAsTest and that is only tested indirectly in integration tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2136) Test the streaming scala API
Márton Balassi created FLINK-2136: - Summary: Test the streaming scala API Key: FLINK-2136 URL: https://issues.apache.org/jira/browse/FLINK-2136 Project: Flink Issue Type: Test Components: Scala API, Streaming Affects Versions: 0.9 Reporter: Márton Balassi There are no test covering the streaming scala API. I would suggest to test whether the StreamGraph created by a certain operation looks as expected. Deeper layers and runtime should not be tested here, that is done in streaming-core. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [contrib] Storm compatibility
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/764#issuecomment-108118618 Oh, and one more info for @mjsax: I added you as collaborator to my repo, so you can directly push to this branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2137) Expose partitionByHash for WindowedDataStream
Márton Balassi created FLINK-2137: - Summary: Expose partitionByHash for WindowedDataStream Key: FLINK-2137 URL: https://issues.apache.org/jira/browse/FLINK-2137 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann This functionality has been recently exposed for DataStreams and ConnectedDataStreams, but not for WindowedDataStreams yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/759#issuecomment-108122772 Added a JIRA for the [issue](https://issues.apache.org/jira/browse/FLINK-2139). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Remove extra HTML tags in TypeInformation Java...
GitHub user hsaputra opened a pull request: https://github.com/apache/flink/pull/766 Remove extra HTML tags in TypeInformation JavaDoc class header. Simple clean up PR remove extra ```li``` HTML tags in TypeInformation JavaDoc class header. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hsaputra/flink remove_extra_li_javadoc_typeinformation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/766.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #766 commit f02fc66b9897cd6c76f193ef514b41388e7bf537 Author: Henry Saputra hsapu...@apache.org Date: 2015-06-03T04:10:06Z Remove extra li HTML tags in TypeInformation JavaDoc class header. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568988#comment-14568988 ] ASF GitHub Bot commented on FLINK-1319: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107924813 I understand that concern. But using sysoutput will also be interleaved with the Client sysout printing and the regular system logging. Also, I think its very bad practice to print stuff using systemout, because its not controllable in any way. With log4j we can configure the analysis output the way we want. if you want the messages to look like regular sysout text, we can specify a custom output schema for the classes in the sca java package. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1430] [streaming] Scala API completenes...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/753 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107924813 I understand that concern. But using sysoutput will also be interleaved with the Client sysout printing and the regular system logging. Also, I think its very bad practice to print stuff using systemout, because its not controllable in any way. With log4j we can configure the analysis output the way we want. if you want the messages to look like regular sysout text, we can specify a custom output schema for the classes in the sca java package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2111) Add terminate signal to cleanly stop streaming jobs
[ https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568994#comment-14568994 ] ASF GitHub Bot commented on FLINK-2111: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-107926235 Hi, after thinking about the signal name again, I disagree. terminate does not mean kill, but is a request to gracefully shut down. stop from my point of view is too soft, as it does not indicates strongly enough, that the job is completely shut down and cleared. I would interpret stop more as an interrupt signal (with an according re-start signal, that wakes up the job again). Any other opinions? Maybe we need a voting for the signal name. Add terminate signal to cleanly stop streaming jobs - Key: FLINK-2111 URL: https://issues.apache.org/jira/browse/FLINK-2111 Project: Flink Issue Type: Improvement Components: Distributed Runtime, JobManager, Local Runtime, Streaming, TaskManager, Webfrontend Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently, streaming jobs can only be stopped using cancel command, what is a hard stop with no clean shutdown. The new introduced terminate signal, will only affect streaming source tasks such that the sources can stop emitting data and terminate cleanly, resulting in a clean termination of the whole streaming job. This feature is a pre-requirment for https://issues.apache.org/jira/browse/FLINK-1929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [ml] Rework of the optimization framework
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/758 [ml] Rework of the optimization framework This PR reworks the current optimization framework to make the regularization part of the optimization algorithm. Furthermore, it consolidates the `LossFunction` by defining a common interface for loss functions. The `GenericLossFunction` allows to construct loss function by specifying the outer loss function and the prediction function individually. Additionally, this PR introduces some syntactic sugar which makes programming with broadcast variables easier. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink mlSugar Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/758.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #758 commit e82782e5a165d9e3a348db5efe6c8188c9cbac72 Author: Till Rohrmann trohrm...@apache.org Date: 2015-05-28T01:03:24Z [ml] Adds syntactic sugar for map with single broadcast element. Rewrites the optimization framework to to consolidate the loss function. Adds closure cleaner to convenience functions on RichDataSet Removing regularization from LossFunction and making it part of the optimizer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2102) Add predict operation for LabeledVector
[ https://issues.apache.org/jira/browse/FLINK-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-2102. -- Resolution: Fixed Added via d163a817fa2e330e86384d0bbcd104f051a6fb48 Add predict operation for LabeledVector --- Key: FLINK-2102 URL: https://issues.apache.org/jira/browse/FLINK-2102 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 Currently we can only call predict on DataSet[V : Vector]. A lot of times though we have a DataSet[LabeledVector] that we split into a train and test set. We should be able to make predictions on the test DataSet[LabeledVector] without having to transform it into a DataSet[Vector] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2102) Add predict operation for LabeledVector
[ https://issues.apache.org/jira/browse/FLINK-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568966#comment-14568966 ] ASF GitHub Bot commented on FLINK-2102: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/744 Add predict operation for LabeledVector --- Key: FLINK-2102 URL: https://issues.apache.org/jira/browse/FLINK-2102 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 Currently we can only call predict on DataSet[V : Vector]. A lot of times though we have a DataSet[LabeledVector] that we split into a train and test set. We should be able to make predictions on the test DataSet[LabeledVector] without having to transform it into a DataSet[Vector] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2102] [ml] Add predict operation for La...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/744 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [ml] [WIP] Consolidation of loss function
Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/740 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [ml] [WIP] Consolidation of loss function
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/740#issuecomment-107903012 I closed this PR to open it as a proper PR again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness
[ https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568874#comment-14568874 ] ASF GitHub Bot commented on FLINK-1430: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/753 Add test for streaming scala api completeness - Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Currently the completeness of the streaming scala api is not tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [ml] Rework of the optimization framework
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/758#issuecomment-107905323 This PR is the new PR for #740. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568999#comment-14568999 ] Faye Beligianni commented on FLINK-2069: Hello, I am using a timeout of 1 in {{iterate()}} call. You can find the code that I tried to use the {{writeAsCsv}} function in this link: [https://github.com/fobeligi/incubator-flink/blob/inc-ml-test/flink-staging/flink-streaming/flink-streaming-ml/src/test/scala/org/apache/flink/streaming/incrementalML/test/classifier/HoeffdingTreeITSuite.scala] I am using this scala test in order to run an implementation of the Hoeffding Tree algorithm. writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569150#comment-14569150 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31527840 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Specifies to which extent user-defined functions are analyzed in order + * to give the Flink optimizer an insight of UDF internals and inform + * the user about common implementation mistakes. + * + */ +public enum UdfAnalysisMode { --- End diff -- This is user-facing. I vote to rename it. @rmetzger agrees that in his experience the UDF part can be misleading. I understand why you chose this though... the operators make use of the UDF term all over the place. What about `CodeAnalysisMode`? After all both this PR and the package are called *code analysis* and not UDF analysis. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31527840 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Specifies to which extent user-defined functions are analyzed in order + * to give the Flink optimizer an insight of UDF internals and inform + * the user about common implementation mistakes. + * + */ +public enum UdfAnalysisMode { --- End diff -- This is user-facing. I vote to rename it. @rmetzger agrees that in his experience the UDF part can be misleading. I understand why you chose this though... the operators make use of the UDF term all over the place. What about `CodeAnalysisMode`? After all both this PR and the package are called *code analysis* and not UDF analysis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569154#comment-14569154 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31528084 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Specifies to which extent user-defined functions are analyzed in order --- End diff -- - I would make this more concrete. What about the list from your initial PR comment? - line 25: empty Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements
[ https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569132#comment-14569132 ] Robert Metzger commented on FLINK-2130: --- I would let the source fail. All the other components in Flink also always fail immediately. I guess there are ways to configure retries at the RabbitMQ connector. RabbitMQ source does not fail when failing to retrieve elements --- Key: FLINK-2130 URL: https://issues.apache.org/jira/browse/FLINK-2130 Project: Flink Issue Type: Bug Components: Streaming, Streaming Connectors Reporter: Stephan Ewen The RMQ source only logs when elements cannot be retrieved. Failures are not propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569137#comment-14569137 ] Aljoscha Krettek commented on FLINK-2069: - Ha! I can finally reproduce it. No fix yet, though. writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31527970 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Specifies to which extent user-defined functions are analyzed in order + * to give the Flink optimizer an insight of UDF internals and inform + * the user about common implementation mistakes. + * + */ +public enum UdfAnalysisMode { + + /** +* UDF analysis does not take place. +*/ + DISABLED, + + /** +* Hints for improvement of the program are printed to the log. +*/ + HINTING_ENABLED, + + /** +* The program will be automatically optimized with knowledge from UDF +* analysis. +*/ + OPTIMIZING_ENABLED; + +} --- End diff -- Since the user will have to set this, what about keeping it short? `DISABLE`, `HINT`, `OPTIMIZE`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569184#comment-14569184 ] Aljoscha Krettek commented on FLINK-2069: - I think I finally fixed it. Could you try this code: https://github.com/apache/flink/pull/759 writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569183#comment-14569183 ] ASF GitHub Bot commented on FLINK-2069: --- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/759 [FLINK-2069] Fix Scala CSV Output Format Before, the Scala Streaming API used the Java CsvOutputFormat. This is not compatible with Scala Tuples. The FileSinkFunction would silently swallow the thrown exceptions and the job would finish cleanly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink fix-scala-csv-output Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #759 commit 16986ced8b6961bb2ada269b29dd4a5b59159983 Author: Aljoscha Krettek aljoscha.kret...@gmail.com Date: 2015-06-02T14:37:06Z [FLINK-2069] Fix Scala CSV Output Format writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31530742 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java --- @@ -54,8 +54,11 @@ private MapString, DataSet? broadcastVariables; + // NOTE: only set this variable via setSemanticProperties() --- End diff -- I think this refactoring is quite fragile. The semantic properties utility is not returning an empty properties object, but null and you take care of setting it correctly here depending on whether the forwarded fields have been set manually or not. If optimize is enabled and there are manual annotations, they will be overriden. I am wondering if it is better to have manual annotations trump optimizer annotations. What's your opinion on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements
[ https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-2130: -- Component/s: Streaming Connectors Streaming RabbitMQ source does not fail when failing to retrieve elements --- Key: FLINK-2130 URL: https://issues.apache.org/jira/browse/FLINK-2130 Project: Flink Issue Type: Bug Components: Streaming, Streaming Connectors Reporter: Stephan Ewen The RMQ source only logs when elements cannot be retrieved. Failures are not propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569152#comment-14569152 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31527970 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Specifies to which extent user-defined functions are analyzed in order + * to give the Flink optimizer an insight of UDF internals and inform + * the user about common implementation mistakes. + * + */ +public enum UdfAnalysisMode { + + /** +* UDF analysis does not take place. +*/ + DISABLED, + + /** +* Hints for improvement of the program are printed to the log. +*/ + HINTING_ENABLED, + + /** +* The program will be automatically optimized with knowledge from UDF +* analysis. +*/ + OPTIMIZING_ENABLED; + +} --- End diff -- Since the user will have to set this, what about keeping it short? `DISABLE`, `HINT`, `OPTIMIZE`? Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569202#comment-14569202 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31530742 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java --- @@ -54,8 +54,11 @@ private MapString, DataSet? broadcastVariables; + // NOTE: only set this variable via setSemanticProperties() --- End diff -- I think this refactoring is quite fragile. The semantic properties utility is not returning an empty properties object, but null and you take care of setting it correctly here depending on whether the forwarded fields have been set manually or not. If optimize is enabled and there are manual annotations, they will be overriden. I am wondering if it is better to have manual annotations trump optimizer annotations. What's your opinion on this? Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2111] Add terminate signal to cleanly...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-107926235 Hi, after thinking about the signal name again, I disagree. terminate does not mean kill, but is a request to gracefully shut down. stop from my point of view is too soft, as it does not indicates strongly enough, that the job is completely shut down and cleared. I would interpret stop more as an interrupt signal (with an according re-start signal, that wakes up the job again). Any other opinions? Maybe we need a voting for the signal name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31528084 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Specifies to which extent user-defined functions are analyzed in order --- End diff -- - I would make this more concrete. What about the list from your initial PR comment? - line 25: empty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569179#comment-14569179 ] ASF GitHub Bot commented on FLINK-2069: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/724 writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format
GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/759 [FLINK-2069] Fix Scala CSV Output Format Before, the Scala Streaming API used the Java CsvOutputFormat. This is not compatible with Scala Tuples. The FileSinkFunction would silently swallow the thrown exceptions and the job would finish cleanly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink fix-scala-csv-output Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #759 commit 16986ced8b6961bb2ada269b29dd4a5b59159983 Author: Aljoscha Krettek aljoscha.kret...@gmail.com Date: 2015-06-02T14:37:06Z [FLINK-2069] Fix Scala CSV Output Format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [ml] Rework of the optimization framework
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/758 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569035#comment-14569035 ] Aljoscha Krettek commented on FLINK-2133: - Also, this is on my checkpoint-hardening branch: https://github.com/aljoscha/flink/tree/checkpoint-hardening I don't know if this also occurs on master. I'm not seeing this often, most of the travis runs go trough. Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569042#comment-14569042 ] ASF GitHub Bot commented on FLINK-1319: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107941730 Naming UDF. The feedback of committers giving talks about Flink some time ago was that the name UDF was sometimes confusing. @rmetzger, can you confirm this? We might take this into account and rename the UdfAnalysisMode to something else, for example just CodeAnalysisMode. That's right. People associate UDFs with SQL databases that allow to pass in custom functions (which is right, but they start thinking Flink is a SQL database). In this case, its not super critical because its internal code. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569062#comment-14569062 ] Ufuk Celebi commented on FLINK-2133: I've looked at the ExecutionGraph and this seems to be a simple deadlock due to the ordering of lock acquisitions. Two tasks of the same JobVertex aquire the locks in the following order: - T1 (ForkJoinPool-1-worker-3): ExecutionGraph#restart() aquires ExecutionGraph#progressLock = ExecutionJobVertex#reset() aquires ExecutionJobVertex#stateMonitor - T2 (flink-akka.actor.default-dispatcher-4): ExecutionJobVertex#subtaskInFinalState acquires ExecutionJobVertex#stateMonitor to cancel task = ExecutionGraph#jobVertexInFinalState() aquires ExecutionGraph#progressLock I think that both messages have to be triggered by the same task, because both actions should only happen for the final vertex (I think cancel (transition to cancelling) and canceling complete msg (transition to cancelled)). Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[jira] [Commented] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements
[ https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569096#comment-14569096 ] Márton Balassi commented on FLINK-2130: --- This was a conscious design choice, so we do not fail a whole source task just because one single element was not read properly. Open and close failures are propagated. Would you suggest to do this on every failed message? RabbitMQ source does not fail when failing to retrieve elements --- Key: FLINK-2130 URL: https://issues.apache.org/jira/browse/FLINK-2130 Project: Flink Issue Type: Bug Reporter: Stephan Ewen The RMQ source only logs when elements cannot be retrieved. Failures are not propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569032#comment-14569032 ] Aljoscha Krettek commented on FLINK-2133: - This was the whole travis run: https://travis-ci.org/aljoscha/flink/jobs/65039616 Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1430) Add test for streaming scala api completeness
[ https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi closed FLINK-1430. - Resolution: Implemented Fix Version/s: 0.9 Implemented via 50c818d Add test for streaming scala api completeness - Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Fix For: 0.9 Currently the completeness of the streaming scala api is not tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2133) Possible deadlock in ExecutionGraph
Aljoscha Krettek created FLINK-2133: --- Summary: Possible deadlock in ExecutionGraph Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase
Ufuk Celebi created FLINK-2134: -- Summary: Deadlock in SuccessAfterNetworkBuffersFailureITCase Key: FLINK-2134 URL: https://issues.apache.org/jira/browse/FLINK-2134 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ufuk Celebi I ran into the issue in a Travis run for a PR: https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt I can reproduce this locally by running SuccessAfterNetworkBuffersFailureITCase multiple times: {code} cluster = new ForkableFlinkMiniCluster(config, false); for (int i = 0; i 100; i++) { // run test programs CC, KMeans, CC } {code} The iteration tasks wait for superstep notifications like this: {code} Join (Join at runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() [0x000123f2a000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57) - locked 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. The system needs to be under some load for this to occur after multiple runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107941730 Naming UDF. The feedback of committers giving talks about Flink some time ago was that the name UDF was sometimes confusing. @rmetzger, can you confirm this? We might take this into account and rename the UdfAnalysisMode to something else, for example just CodeAnalysisMode. That's right. People associate UDFs with SQL databases that allow to pass in custom functions (which is right, but they start thinking Flink is a SQL database). In this case, its not super critical because its internal code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2128) ScalaShellITSuite failing
[ https://issues.apache.org/jira/browse/FLINK-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-2128. Resolution: Duplicate ScalaShellITSuite failing - Key: FLINK-2128 URL: https://issues.apache.org/jira/browse/FLINK-2128 Project: Flink Issue Type: Bug Components: Scala Shell Affects Versions: master Reporter: Ufuk Celebi https://s3.amazonaws.com/archive.travis-ci.org/jobs/64947781/log.txt {code} ScalaShellITSuite: log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /.log (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileOutputStream.init(FileOutputStream.java:142) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.clinit(LogManager.java:127) at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288) at org.apache.flink.test.util.TestBaseUtils.clinit(TestBaseUtils.java:69) at org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:198) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:33) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) at org.apache.flink.api.scala.ScalaShellITSuite.run(ScalaShellITSuite.scala:33) at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492) at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528) at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526) at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29) at org.scalatest.Suite$class.run(Suite.scala:1421) at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043) at org.scalatest.tools.Runner$.main(Runner.scala:860) at org.scalatest.tools.Runner.main(Runner.scala) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31531993 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java --- @@ -54,8 +54,11 @@ private MapString, DataSet? broadcastVariables; + // NOTE: only set this variable via setSemanticProperties() --- End diff -- I saw that there is a check for this in the UdfAnalyzerUtil. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569220#comment-14569220 ] ASF GitHub Bot commented on FLINK-1993: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/760 [FLINK-1993] [ml] Replaces custom SGD in MultipleLinearRegression with optimizer's SGD This PR replaces the custom SGD implementation with the optimization framework's SGD implementation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink replaceSGDInMLR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/760.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #760 commit 2df0c700237af0456c3770c198ed595fdf205408 Author: Till Rohrmann trohrm...@apache.org Date: 2015-05-29T16:02:47Z [FLINK-1993] [ml] Replaces custom SGD logic with optimization framework's SGD in MultipleLinearRegression Fixes PipelineITSuite because of change MLR loss function Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31532320 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/sca/UdfAnalyzer.java --- @@ -0,0 +1,431 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.sca; + +import org.apache.flink.api.common.functions.CoGroupFunction; +import org.apache.flink.api.common.functions.CrossFunction; +import org.apache.flink.api.common.functions.FilterFunction; +import org.apache.flink.api.common.functions.FlatJoinFunction; +import org.apache.flink.api.common.functions.FlatMapFunction; +import org.apache.flink.api.common.functions.GroupReduceFunction; +import org.apache.flink.api.common.functions.JoinFunction; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.functions.ReduceFunction; +import org.apache.flink.api.common.operators.DualInputSemanticProperties; +import org.apache.flink.api.common.operators.SemanticProperties; +import org.apache.flink.api.common.operators.SingleInputSemanticProperties; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.functions.SemanticPropUtil; +import org.apache.flink.api.java.operators.Keys; +import org.apache.flink.api.java.operators.Keys.ExpressionKeys; +import org.apache.flink.api.java.sca.TaggedValue.Input; +import org.objectweb.asm.Type; +import org.objectweb.asm.tree.MethodNode; + +import java.lang.reflect.Method; +import java.util.ArrayList; +import java.util.List; + +import static org.apache.flink.api.java.sca.UdfAnalyzerUtils.convertTypeInfoToTaggedValue; +import static org.apache.flink.api.java.sca.UdfAnalyzerUtils.findMethodNode; +import static org.apache.flink.api.java.sca.UdfAnalyzerUtils.mergeReturnValues; +import static org.apache.flink.api.java.sca.UdfAnalyzerUtils.removeUngroupedInputsFromContainer; + +public class UdfAnalyzer { + // exclusion to suppress hints for API operators + private static final String EXCLUDED_CLASSPATH = org/apache/flink; --- End diff -- Instead of excluding this class path, the consensus from reviews so far is to add an `@SkipCodeAnalysis` annotation. This will allow new users to play around with the Flink examples etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31532974 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.operators; + +import org.apache.flink.api.common.UdfAnalysisMode; +import org.apache.flink.api.common.functions.Function; +import org.apache.flink.api.common.functions.InvalidTypesException; +import org.apache.flink.api.common.operators.DualInputSemanticProperties; +import org.apache.flink.api.common.operators.SingleInputSemanticProperties; +import org.apache.flink.api.java.sca.UdfAnalyzer; +import org.apache.flink.api.java.sca.UdfAnalyzerException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public abstract class UdfOperatorUtils { + + private static final Logger LOG = LoggerFactory.getLogger(UdfOperatorUtils.class); + + public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, ? operator, Class? udfBaseClass, + Function udf, Keys? key) { + final UdfAnalysisMode mode = operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode(); + if (mode != UdfAnalysisMode.DISABLED) { + try { + final UdfAnalyzer analyzer = new UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null, + operator.getResultType(), key, null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED); + final boolean success = analyzer.analyze(); + if (success) { + if (mode == UdfAnalysisMode.OPTIMIZING_ENABLED + !operator.udfWithForwardedFieldsAnnotation(udf.getClass())) { + operator.setSemanticProperties((SingleInputSemanticProperties) analyzer.getSemanticProperties()); + operator.setAnalyzedUdfSemanticsFlag(); --- End diff -- I think it would make sense to also print the inferred forwarded fields (at least for debugging purposes). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569242#comment-14569242 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31532974 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.operators; + +import org.apache.flink.api.common.UdfAnalysisMode; +import org.apache.flink.api.common.functions.Function; +import org.apache.flink.api.common.functions.InvalidTypesException; +import org.apache.flink.api.common.operators.DualInputSemanticProperties; +import org.apache.flink.api.common.operators.SingleInputSemanticProperties; +import org.apache.flink.api.java.sca.UdfAnalyzer; +import org.apache.flink.api.java.sca.UdfAnalyzerException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public abstract class UdfOperatorUtils { + + private static final Logger LOG = LoggerFactory.getLogger(UdfOperatorUtils.class); + + public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, ? operator, Class? udfBaseClass, + Function udf, Keys? key) { + final UdfAnalysisMode mode = operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode(); + if (mode != UdfAnalysisMode.DISABLED) { + try { + final UdfAnalyzer analyzer = new UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null, + operator.getResultType(), key, null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED); + final boolean success = analyzer.analyze(); + if (success) { + if (mode == UdfAnalysisMode.OPTIMIZING_ENABLED + !operator.udfWithForwardedFieldsAnnotation(udf.getClass())) { + operator.setSemanticProperties((SingleInputSemanticProperties) analyzer.getSemanticProperties()); + operator.setAnalyzedUdfSemanticsFlag(); --- End diff -- I think it would make sense to also print the inferred forwarded fields (at least for debugging purposes). Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569218#comment-14569218 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31531758 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.operators; + +import org.apache.flink.api.common.UdfAnalysisMode; +import org.apache.flink.api.common.functions.Function; +import org.apache.flink.api.common.functions.InvalidTypesException; +import org.apache.flink.api.common.operators.DualInputSemanticProperties; +import org.apache.flink.api.common.operators.SingleInputSemanticProperties; +import org.apache.flink.api.java.sca.UdfAnalyzer; +import org.apache.flink.api.java.sca.UdfAnalyzerException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public abstract class UdfOperatorUtils { + + private static final Logger LOG = LoggerFactory.getLogger(UdfOperatorUtils.class); + + public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, ? operator, Class? udfBaseClass, --- End diff -- I vote to pass the name of the operator as well. The log output will then be more consistent. Currently the class name is used. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/760 [FLINK-1993] [ml] Replaces custom SGD in MultipleLinearRegression with optimizer's SGD This PR replaces the custom SGD implementation with the optimization framework's SGD implementation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink replaceSGDInMLR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/760.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #760 commit 2df0c700237af0456c3770c198ed595fdf205408 Author: Till Rohrmann trohrm...@apache.org Date: 2015-05-29T16:02:47Z [FLINK-1993] [ml] Replaces custom SGD logic with optimization framework's SGD in MultipleLinearRegression Fixes PipelineITSuite because of change MLR loss function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569271#comment-14569271 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31535397 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.operators; + +import org.apache.flink.api.common.UdfAnalysisMode; +import org.apache.flink.api.common.functions.Function; +import org.apache.flink.api.common.functions.InvalidTypesException; +import org.apache.flink.api.common.operators.DualInputSemanticProperties; +import org.apache.flink.api.common.operators.SingleInputSemanticProperties; +import org.apache.flink.api.java.sca.UdfAnalyzer; +import org.apache.flink.api.java.sca.UdfAnalyzerException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public abstract class UdfOperatorUtils { + + private static final Logger LOG = LoggerFactory.getLogger(UdfOperatorUtils.class); + + public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, ? operator, Class? udfBaseClass, + Function udf, Keys? key) { + final UdfAnalysisMode mode = operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode(); + if (mode != UdfAnalysisMode.DISABLED) { + try { + final UdfAnalyzer analyzer = new UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null, + operator.getResultType(), key, null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED); + final boolean success = analyzer.analyze(); + if (success) { + if (mode == UdfAnalysisMode.OPTIMIZING_ENABLED + !operator.udfWithForwardedFieldsAnnotation(udf.getClass())) { + operator.setSemanticProperties((SingleInputSemanticProperties) analyzer.getSemanticProperties()); + operator.setAnalyzedUdfSemanticsFlag(); + } + else if (mode == UdfAnalysisMode.HINTING_ENABLED) { + analyzer.addSemanticPropertiesHints(); + } + LOG.info(analyzer.getHintsString()); + } + } + catch (InvalidTypesException e) { + LOG.debug(Unable to do UDF analysis due to missing type information., e); + } + catch (UdfAnalyzerException e) { + LOG.debug(UDF analysis failed., e); + } + } + } + + public static void analyzeDualInputUdf(TwoInputUdfOperator?, ?, ?, ? operator, Class? udfBaseClass, + Function udf, Keys? key1, Keys? key2) { + final UdfAnalysisMode mode = operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode(); + if (mode != UdfAnalysisMode.DISABLED) { --- End diff -- We could log that the analysis is disabled as well. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31535397 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.operators; + +import org.apache.flink.api.common.UdfAnalysisMode; +import org.apache.flink.api.common.functions.Function; +import org.apache.flink.api.common.functions.InvalidTypesException; +import org.apache.flink.api.common.operators.DualInputSemanticProperties; +import org.apache.flink.api.common.operators.SingleInputSemanticProperties; +import org.apache.flink.api.java.sca.UdfAnalyzer; +import org.apache.flink.api.java.sca.UdfAnalyzerException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public abstract class UdfOperatorUtils { + + private static final Logger LOG = LoggerFactory.getLogger(UdfOperatorUtils.class); + + public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, ? operator, Class? udfBaseClass, + Function udf, Keys? key) { + final UdfAnalysisMode mode = operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode(); + if (mode != UdfAnalysisMode.DISABLED) { + try { + final UdfAnalyzer analyzer = new UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null, + operator.getResultType(), key, null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED); + final boolean success = analyzer.analyze(); + if (success) { + if (mode == UdfAnalysisMode.OPTIMIZING_ENABLED + !operator.udfWithForwardedFieldsAnnotation(udf.getClass())) { + operator.setSemanticProperties((SingleInputSemanticProperties) analyzer.getSemanticProperties()); + operator.setAnalyzedUdfSemanticsFlag(); + } + else if (mode == UdfAnalysisMode.HINTING_ENABLED) { + analyzer.addSemanticPropertiesHints(); + } + LOG.info(analyzer.getHintsString()); + } + } + catch (InvalidTypesException e) { + LOG.debug(Unable to do UDF analysis due to missing type information., e); + } + catch (UdfAnalyzerException e) { + LOG.debug(UDF analysis failed., e); + } + } + } + + public static void analyzeDualInputUdf(TwoInputUdfOperator?, ?, ?, ? operator, Class? udfBaseClass, + Function udf, Keys? key1, Keys? key2) { + final UdfAnalysisMode mode = operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode(); + if (mode != UdfAnalysisMode.DISABLED) { --- End diff -- We could log that the analysis is disabled as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---