[jira] [Resolved] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse
[ https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiwan Park resolved FLINK-2560. Resolution: Fixed Fix Version/s: 0.10 Fixed at 9c7f769388d90c3a79d8c08995d4eae892b23a6e Flink-Avro Plugin cannot be handled by Eclipse -- Key: FLINK-2560 URL: https://issues.apache.org/jira/browse/FLINK-2560 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Trivial Fix For: 0.10 Eclipse always shows the following error: {noformat} Description ResourcePathLocationType Plugin execution not overed by lifecycle configuration: org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: generate-sources) pom.xml /flink-avro line 134Maven Project Build Lifecycle Mapping problem {noformat} This can be fixed by disable plugin within Eclipse via pluginManagement ... lifecyleMappingMetaData -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2557) Manual type information via returns fails in DataSet API
[ https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler reassigned FLINK-2557: --- Assignee: Chesnay Schepler Manual type information via returns fails in DataSet API -- Key: FLINK-2557 URL: https://issues.apache.org/jira/browse/FLINK-2557 Project: Flink Issue Type: Bug Components: Java API Reporter: Matthias J. Sax Assignee: Chesnay Schepler I changed the WordCount example as below and get an exception: Tokenizer is change to this (removed generics and added cast to String): {code:java} public static final class Tokenizer implements FlatMapFunction { public void flatMap(Object value, Collector out) { String[] tokens = ((String) value).toLowerCase().split(\\W+); for (String token : tokens) { if (token.length() 0) { out.collect(new Tuple2String, Integer(token, 1)); } } } } {code} I added call to returns() here: {code:java} DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).returns(Tuple2String,Integer) .groupBy(0).sum(1); {code} The exception is: {noformat} Exception in thread main java.lang.IllegalArgumentException: The types of the interface org.apache.flink.api.common.functions.FlatMapFunction could not be inferred. Support for synthetic interfaces, lambdas, and generic types is limited at this point. at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365) at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279) at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120) at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69) {noformat} Fix: This should not immediately fail, but also only give a MissingTypeInfo so that type hints would work. The error message is also wrong, btw: It should state that raw types are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2563] [gelly] changed Graph's run() to ...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/1042#issuecomment-133881425 Exactly what I had in mind. +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool
[ https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708474#comment-14708474 ] Martin Liesenberg commented on FLINK-2017: -- The use of an Option object to encapsulate the parameters should probably be used in ParameterTool as well, right? What I have come up with is a generic Option class and a corresponding RequiredParameter class. Add predefined required parameters to ParameterTool --- Key: FLINK-2017 URL: https://issues.apache.org/jira/browse/FLINK-2017 Project: Flink Issue Type: Improvement Affects Versions: 0.9 Reporter: Robert Metzger Labels: starter In FLINK-1525 we've added the {{ParameterTool}}. During the PR review, there was a request for required parameters. This issue is about implementing a facility to define required parameters. The tool should also be able to print a help menu with a list of all parameters. This test case shows my initial ideas how to design the API {code} @Test public void requiredParameters() { RequiredParameters required = new RequiredParameters(); Option input = required.add(input).alt(i).help(Path to input file or directory); // parameter with long and short variant required.add(output); // parameter only with long variant Option parallelism = required.add(parallelism).alt(p).type(Integer.class); // parameter with type Option spOption = required.add(sourceParallelism).alt(sp).defaultValue(12).help(Number specifying the number of parallel data source instances); // parameter with default value, specifying the type. Option executionType = required.add(executionType).alt(et).defaultValue(pipelined).choices(pipelined, batch); ParameterUtil parameter = ParameterUtil.fromArgs(new String[]{-i, someinput, --output, someout, -p, 15}); required.check(parameter); required.printHelp(); required.checkAndPopulate(parameter); String inputString = input.get(); int par = parallelism.getInteger(); String output = parameter.get(output); int sourcePar = parameter.getInteger(spOption.getName()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2557) Manual type information via returns fails in DataSet API
[ https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708505#comment-14708505 ] ASF GitHub Bot commented on FLINK-2557: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1045 [FLINK-2557] TypeExtractor properly returns MissingTypeInfo This fix is not really obvious so let me explain: getParameterTye() is called from two different places in the TypeExtractor; to validate the input type and to extract the output type. Both cases consider the possibility that getParameterType() fails, but check for different exceptions. The TypeExtractor only returns a MissingTypeInfo if it encounters an InvalidTypesException; IllegalArgumentExceptions are not catched. This is what @mjsax encountered. Changing the exception type causes the TypeExtractor to properly return a MissingTypeInfo, which is later overridden by the returns(...) call. In order for the input validation to still work properly aswell, it now catches InvalidTypesExceptions instead. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 2557_types Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1045 commit 1c1dc459915c875ab0a4412aa3ef0a844f092171 Author: zentol s.mo...@web.de Date: 2015-08-23T19:41:44Z [FLINK-2557] TypeExtractor properly returns MissingTypeInfo Manual type information via returns fails in DataSet API -- Key: FLINK-2557 URL: https://issues.apache.org/jira/browse/FLINK-2557 Project: Flink Issue Type: Bug Components: Java API Reporter: Matthias J. Sax Assignee: Chesnay Schepler I changed the WordCount example as below and get an exception: Tokenizer is change to this (removed generics and added cast to String): {code:java} public static final class Tokenizer implements FlatMapFunction { public void flatMap(Object value, Collector out) { String[] tokens = ((String) value).toLowerCase().split(\\W+); for (String token : tokens) { if (token.length() 0) { out.collect(new Tuple2String, Integer(token, 1)); } } } } {code} I added call to returns() here: {code:java} DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).returns(Tuple2String,Integer) .groupBy(0).sum(1); {code} The exception is: {noformat} Exception in thread main java.lang.IllegalArgumentException: The types of the interface org.apache.flink.api.common.functions.FlatMapFunction could not be inferred. Support for synthetic interfaces, lambdas, and generic types is limited at this point. at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365) at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279) at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120) at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69) {noformat} Fix: This should not immediately fail, but also only give a MissingTypeInfo so that type hints would work. The error message is also wrong, btw: It should state that raw types are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys
GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1043 [FLINK-2565] Support primitive Arrays as keys Adds a comparator and test for every primitive array type. Modifies the CustomType2 class in GroupingTest to retain a field with an unsupported type. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 2565_arrayKey Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1043.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1043 commit 7551a47e60186a91ecc1df364f1a3ae0c9474a3f Author: zentol s.mo...@web.de Date: 2015-08-23T13:36:47Z [FLINK-2565] Support primitive Arrays as keys --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2565) Support primitive arrays as keys
[ https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708409#comment-14708409 ] ASF GitHub Bot commented on FLINK-2565: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1043 [FLINK-2565] Support primitive Arrays as keys Adds a comparator and test for every primitive array type. Modifies the CustomType2 class in GroupingTest to retain a field with an unsupported type. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 2565_arrayKey Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1043.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1043 commit 7551a47e60186a91ecc1df364f1a3ae0c9474a3f Author: zentol s.mo...@web.de Date: 2015-08-23T13:36:47Z [FLINK-2565] Support primitive Arrays as keys Support primitive arrays as keys Key: FLINK-2565 URL: https://issues.apache.org/jira/browse/FLINK-2565 Project: Flink Issue Type: Improvement Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2560] Flink-Avro Plugin cannot be handl...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1041 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse
[ https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708434#comment-14708434 ] ASF GitHub Bot commented on FLINK-2560: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1041 Flink-Avro Plugin cannot be handled by Eclipse -- Key: FLINK-2560 URL: https://issues.apache.org/jira/browse/FLINK-2560 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Trivial Eclipse always shows the following error: {noformat} Description ResourcePathLocationType Plugin execution not overed by lifecycle configuration: org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: generate-sources) pom.xml /flink-avro line 134Maven Project Build Lifecycle Mapping problem {noformat} This can be fixed by disable plugin within Eclipse via pluginManagement ... lifecyleMappingMetaData -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2557) Manual type information via returns fails in DataSet API
[ https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708463#comment-14708463 ] Aljoscha Krettek commented on FLINK-2557: - Yes, I think that's ok. Manual type information via returns fails in DataSet API -- Key: FLINK-2557 URL: https://issues.apache.org/jira/browse/FLINK-2557 Project: Flink Issue Type: Bug Components: Java API Reporter: Matthias J. Sax Assignee: Chesnay Schepler I changed the WordCount example as below and get an exception: Tokenizer is change to this (removed generics and added cast to String): {code:java} public static final class Tokenizer implements FlatMapFunction { public void flatMap(Object value, Collector out) { String[] tokens = ((String) value).toLowerCase().split(\\W+); for (String token : tokens) { if (token.length() 0) { out.collect(new Tuple2String, Integer(token, 1)); } } } } {code} I added call to returns() here: {code:java} DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).returns(Tuple2String,Integer) .groupBy(0).sum(1); {code} The exception is: {noformat} Exception in thread main java.lang.IllegalArgumentException: The types of the interface org.apache.flink.api.common.functions.FlatMapFunction could not be inferred. Support for synthetic interfaces, lambdas, and generic types is limited at this point. at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365) at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279) at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120) at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69) {noformat} Fix: This should not immediately fail, but also only give a MissingTypeInfo so that type hints would work. The error message is also wrong, btw: It should state that raw types are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-133870941 +1 for moving histogram functions into `DataSetUtils`. It would be helpful for range partitioning. I'll review this in next days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features
[ https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708418#comment-14708418 ] ASF GitHub Bot commented on FLINK-2030: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-133870941 +1 for moving histogram functions into `DataSetUtils`. It would be helpful for range partitioning. I'll review this in next days. Implement an online histogram with Merging and equalization features Key: FLINK-2030 URL: https://issues.apache.org/jira/browse/FLINK-2030 Project: Flink Issue Type: Sub-task Components: Machine Learning Library Reporter: Sachin Goel Assignee: Sachin Goel Priority: Minor Labels: ML For the implementation of the decision tree in https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an histogram with online updates, merging and equalization features. A reference implementation is provided in [1] [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse
[ https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708429#comment-14708429 ] ASF GitHub Bot commented on FLINK-2560: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1041#issuecomment-133874756 Looks good to merge. I'll merge this. Flink-Avro Plugin cannot be handled by Eclipse -- Key: FLINK-2560 URL: https://issues.apache.org/jira/browse/FLINK-2560 Project: Flink Issue Type: Improvement Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Trivial Eclipse always shows the following error: {noformat} Description ResourcePathLocationType Plugin execution not overed by lifecycle configuration: org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: generate-sources) pom.xml /flink-avro line 134Maven Project Build Lifecycle Mapping problem {noformat} This can be fixed by disable plugin within Eclipse via pluginManagement ... lifecyleMappingMetaData -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2560] Flink-Avro Plugin cannot be handl...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1041#issuecomment-133874756 Looks good to merge. I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2557] TypeExtractor properly returns Mi...
GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1045 [FLINK-2557] TypeExtractor properly returns MissingTypeInfo This fix is not really obvious so let me explain: getParameterTye() is called from two different places in the TypeExtractor; to validate the input type and to extract the output type. Both cases consider the possibility that getParameterType() fails, but check for different exceptions. The TypeExtractor only returns a MissingTypeInfo if it encounters an InvalidTypesException; IllegalArgumentExceptions are not catched. This is what @mjsax encountered. Changing the exception type causes the TypeExtractor to properly return a MissingTypeInfo, which is later overridden by the returns(...) call. In order for the input validation to still work properly aswell, it now catches InvalidTypesExceptions instead. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 2557_types Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1045 commit 1c1dc459915c875ab0a4412aa3ef0a844f092171 Author: zentol s.mo...@web.de Date: 2015-08-23T19:41:44Z [FLINK-2557] TypeExtractor properly returns MissingTypeInfo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2556) Fix/Refactor pre-flight Key validation
[ https://issues.apache.org/jira/browse/FLINK-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708412#comment-14708412 ] ASF GitHub Bot commented on FLINK-2556: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1044 [FLINK-2556] Refactor/Fix pre-flight Key validation Removed redundant key validation in DistinctOperator Keys constructors now make sure the type of every key is an instance of AtomicType/CompositeType, and that type.isKeyType() is true. Additionally, the ExpressionKeys int[] constructor explicitly rejects Tuple0 Changes one test that actually tried something that shouldn't work in the first place. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink isKeyType_check Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1044.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1044 commit 7a57b6ef2ecdc7adaf770f8585cf8f974c684705 Author: zentol s.mo...@web.de Date: 2015-08-23T13:57:34Z [FLINK-2556] Refactor/Fix pre-flight Key validation Fix/Refactor pre-flight Key validation -- Key: FLINK-2556 URL: https://issues.apache.org/jira/browse/FLINK-2556 Project: Flink Issue Type: Bug Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler The pre-flight key validation checks are inconsistent, at times don't actually check anything and in at least 1 case are done redundantly. For example, * you can group on a tuple containing a non-Atomic-/CompositeType using String[] KeyExpressions (see FLINK-2541) * you can group on an AtomicType even though isKeyType() returns false, if it is contained in a tuple * for distinct(String[]...) the above fails in the DistinctOperator constructor, as it validates the key again for some reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2556] Refactor/Fix pre-flight Key valid...
GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1044 [FLINK-2556] Refactor/Fix pre-flight Key validation Removed redundant key validation in DistinctOperator Keys constructors now make sure the type of every key is an instance of AtomicType/CompositeType, and that type.isKeyType() is true. Additionally, the ExpressionKeys int[] constructor explicitly rejects Tuple0 Changes one test that actually tried something that shouldn't work in the first place. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink isKeyType_check Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1044.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1044 commit 7a57b6ef2ecdc7adaf770f8585cf8f974c684705 Author: zentol s.mo...@web.de Date: 2015-08-23T13:57:34Z [FLINK-2556] Refactor/Fix pre-flight Key validation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2557) Manual type information via returns fails in DataSet API
[ https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708460#comment-14708460 ] Chesnay Schepler commented on FLINK-2557: - I have a fix ready so that a MissingTypeInfo is returned, but am unsure about the error message. Should raw types just be added to the list of unsupported things, like Support for synthetic interfaces, lambdas, and generic or raw types is limited at this point? Manual type information via returns fails in DataSet API -- Key: FLINK-2557 URL: https://issues.apache.org/jira/browse/FLINK-2557 Project: Flink Issue Type: Bug Components: Java API Reporter: Matthias J. Sax Assignee: Chesnay Schepler I changed the WordCount example as below and get an exception: Tokenizer is change to this (removed generics and added cast to String): {code:java} public static final class Tokenizer implements FlatMapFunction { public void flatMap(Object value, Collector out) { String[] tokens = ((String) value).toLowerCase().split(\\W+); for (String token : tokens) { if (token.length() 0) { out.collect(new Tuple2String, Integer(token, 1)); } } } } {code} I added call to returns() here: {code:java} DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).returns(Tuple2String,Integer) .groupBy(0).sum(1); {code} The exception is: {noformat} Exception in thread main java.lang.IllegalArgumentException: The types of the interface org.apache.flink.api.common.functions.FlatMapFunction could not be inferred. Support for synthetic interfaces, lambdas, and generic types is limited at this point. at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710) at org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365) at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279) at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120) at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69) {noformat} Fix: This should not immediately fail, but also only give a MissingTypeInfo so that type hints would work. The error message is also wrong, btw: It should state that raw types are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2563) Gelly's Graph Algorithm Interface is limited
[ https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andra Lungu updated FLINK-2563: --- Summary: Gelly's Graph Algorithm Interface is limited (was: Gelly's Graph Algorithm Interface is limites) Gelly's Graph Algorithm Interface is limited Key: FLINK-2563 URL: https://issues.apache.org/jira/browse/FLINK-2563 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to return the same type of Graph. public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception; In numerous cases, one needs to return a single value, or a modified graph. Off the top of my head, say one would like to implement a Triangle Count library method. That takes as input a Graph and returns the total number of triangles. https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java With the current Gelly abstractions, something like this cannot be supported. Also if I initially had a Graph of Long, Long, NullValue and my algorithm changed the edge values to type Double, for instance, I would again have created an implementation which is not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2563) Gelly's Graph Algorithm Interface is limited
[ https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri reassigned FLINK-2563: Assignee: Vasia Kalavri Gelly's Graph Algorithm Interface is limited Key: FLINK-2563 URL: https://issues.apache.org/jira/browse/FLINK-2563 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Assignee: Vasia Kalavri Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to return the same type of Graph. public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception; In numerous cases, one needs to return a single value, or a modified graph. Off the top of my head, say one would like to implement a Triangle Count library method. That takes as input a Graph and returns the total number of triangles. https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java With the current Gelly abstractions, something like this cannot be supported. Also if I initially had a Graph of Long, Long, NullValue and my algorithm changed the edge values to type Double, for instance, I would again have created an implementation which is not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2563) Gelly's Graph Algorithm Interface is limited
[ https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708345#comment-14708345 ] Andra Lungu commented on FLINK-2563: It's all yours :) Gelly's Graph Algorithm Interface is limited Key: FLINK-2563 URL: https://issues.apache.org/jira/browse/FLINK-2563 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to return the same type of Graph. public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception; In numerous cases, one needs to return a single value, or a modified graph. Off the top of my head, say one would like to implement a Triangle Count library method. That takes as input a Graph and returns the total number of triangles. https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java With the current Gelly abstractions, something like this cannot be supported. Also if I initially had a Graph of Long, Long, NullValue and my algorithm changed the edge values to type Double, for instance, I would again have created an implementation which is not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2548) In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset
[ https://issues.apache.org/jira/browse/FLINK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay resolved FLINK-2548. Resolution: Won't Fix In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset Key: FLINK-2548 URL: https://issues.apache.org/jira/browse/FLINK-2548 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Gabor Gevay Assignee: Gabor Gevay Currently, the performance of vertex centric iteration is suboptimal in those iterations where the workset is small, because the complexity of one iteration contains the number of edges and vertices of the graph because of coGroups: VertexCentricIteration.buildMessagingFunction does a coGroup between the edges and the workset, to get the neighbors to the messaging UDF. This is problematic from a performance point of view, because the coGroup UDF gets called on all the edge groups, including those that are not getting any messages. An analogous problem is present in VertexCentricIteration.createResultSimpleVertex at the creation of the updates: a coGroup happens between the messages and the solution set, which has the number of vertices of the graph included in its complexity. Both of these coGroups could be avoided by doing a join instead (with the same keys that the coGroup uses), and then a groupBy. The complexity of these operations would be dominated by the size of the workset, as opposed to the number of edges or vertices of the graph. The joins should have the edges and the solution set at the build side to achieve this complexity. (They will not be rebuilt at every iteration.) I made some experiments with this, and the initial results seem promising. On some workloads, this achieves a 2 times speedup, because later iterations often have quite small worksets, and these get a huge speedup from this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2541) TypeComparator creation fails for T2T1byte[], byte[]
[ https://issues.apache.org/jira/browse/FLINK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler resolved FLINK-2541. - Resolution: Invalid TypeComparator creation fails for T2T1byte[], byte[] Key: FLINK-2541 URL: https://issues.apache.org/jira/browse/FLINK-2541 Project: Flink Issue Type: Bug Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler When running the following job as a JavaProgramTest: {code} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetTuple2Tuple1byte[], byte[] data = env.fromElements( new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3}), new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3})); data.groupBy(f0.f0) .reduceGroup(new DummyReduceTuple2Tuple1byte[], byte[]()) .print(); {code} with DummyReduce defined as {code} public static class DummyReduceIN implements GroupReduceFunctionIN, IN { @Override public void reduce(IterableIN values, CollectorIN out) throws Exception { for (IN value : values) { out.collect(value); }}} {code} i encountered the following exception: Tuple comparator creation has a bug java.lang.IllegalArgumentException: Tuple comparator creation has a bug at org.apache.flink.api.java.typeutils.TupleTypeInfo.getNewComparator(TupleTypeInfo.java:131) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:133) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:122) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.getTypeComparator(GroupReduceOperatorBase.java:155) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.executeOnCollections(GroupReduceOperatorBase.java:184) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:236) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:215) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:176) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:152) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:109) at org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:33) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:35) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:30) at org.apache.flink.api.java.DataSet.collect(DataSet.java:408) at org.apache.flink.api.java.DataSet.print(DataSet.java:1349) at org.apache.flink.languagebinding.api.java.python.AbstractPythonTest.testProgram(AbstractPythonTest.java:42) at org.apache.flink.test.util.JavaProgramTestBase.testJobCollectionExecution(JavaProgramTestBase.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at
[jira] [Created] (FLINK-2565) Support primitive arrays as keys
Chesnay Schepler created FLINK-2565: --- Summary: Support primitive arrays as keys Key: FLINK-2565 URL: https://issues.apache.org/jira/browse/FLINK-2565 Project: Flink Issue Type: Improvement Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2541) TypeComparator creation fails for T2T1byte[], byte[]
[ https://issues.apache.org/jira/browse/FLINK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708379#comment-14708379 ] Chesnay Schepler commented on FLINK-2541: - since this is not actually an issue of the TypeComparator i will close this ticket, and open a new one to support primitive arrays as keys. The validation checks will be extended in FLINK-2556. TypeComparator creation fails for T2T1byte[], byte[] Key: FLINK-2541 URL: https://issues.apache.org/jira/browse/FLINK-2541 Project: Flink Issue Type: Bug Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler When running the following job as a JavaProgramTest: {code} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetTuple2Tuple1byte[], byte[] data = env.fromElements( new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3}), new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3})); data.groupBy(f0.f0) .reduceGroup(new DummyReduceTuple2Tuple1byte[], byte[]()) .print(); {code} with DummyReduce defined as {code} public static class DummyReduceIN implements GroupReduceFunctionIN, IN { @Override public void reduce(IterableIN values, CollectorIN out) throws Exception { for (IN value : values) { out.collect(value); }}} {code} i encountered the following exception: Tuple comparator creation has a bug java.lang.IllegalArgumentException: Tuple comparator creation has a bug at org.apache.flink.api.java.typeutils.TupleTypeInfo.getNewComparator(TupleTypeInfo.java:131) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:133) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:122) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.getTypeComparator(GroupReduceOperatorBase.java:155) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.executeOnCollections(GroupReduceOperatorBase.java:184) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:236) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:215) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:176) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:152) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:109) at org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:33) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:35) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:30) at org.apache.flink.api.java.DataSet.collect(DataSet.java:408) at org.apache.flink.api.java.DataSet.print(DataSet.java:1349) at org.apache.flink.languagebinding.api.java.python.AbstractPythonTest.testProgram(AbstractPythonTest.java:42) at org.apache.flink.test.util.JavaProgramTestBase.testJobCollectionExecution(JavaProgramTestBase.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at
[jira] [Created] (FLINK-2564) Failing Test: RandomSamplerTest
Matthias J. Sax created FLINK-2564: -- Summary: Failing Test: RandomSamplerTest Key: FLINK-2564 URL: https://issues.apache.org/jira/browse/FLINK-2564 Project: Flink Issue Type: Bug Reporter: Matthias J. Sax {noformat} Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.943 sec FAILURE! - in org.apache.flink.api.java.sampling. testPoissonSamplerFraction(org.apache.flink.api.java.sampling.RandomSamplerTest) Time elapsed: 0.017 sec FAILURE! java.lang.AssertionError: expected fraction: 0.01, result fraction: 0.011300 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.flink.api.java.sampling.RandomSamplerTest.verifySamplerFraction(RandomSamplerTest.java:249) at org.apache.flink.api.java.sampling.RandomSamplerTest.testPoissonSamplerFraction(RandomSamplerTest.java:116) Results : Failed tests: Successfully installed excon-0.33.0 RandomSamplerTest.testPoissonSamplerFraction:116-verifySamplerFraction:249 expected fraction: 0.01, result fraction: 0.011300 {noformat} Full log: https://travis-ci.org/apache/flink/jobs/76720572 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2563] [gelly] changed Graph's run() to ...
GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/1042 [FLINK-2563] [gelly] changed Graph's run() to return an arbitrary result type Added a type parameter to the 'GraphAlgorithm' interface to allow implementing library methods that return single values, Graphs of different types, etc. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink graphAlgorithm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1042.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1042 commit ff7240b9fd1a899c938108604875ea16023e7a78 Author: vasia va...@apache.org Date: 2015-08-23T11:06:15Z [FLINK-2563] [gelly] extended the run() method of GraphAlgorithm interface to return an arbitrary type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2563) Gelly's Graph Algorithm Interface is limited
[ https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708391#comment-14708391 ] ASF GitHub Bot commented on FLINK-2563: --- GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/1042 [FLINK-2563] [gelly] changed Graph's run() to return an arbitrary result type Added a type parameter to the 'GraphAlgorithm' interface to allow implementing library methods that return single values, Graphs of different types, etc. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink graphAlgorithm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1042.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1042 commit ff7240b9fd1a899c938108604875ea16023e7a78 Author: vasia va...@apache.org Date: 2015-08-23T11:06:15Z [FLINK-2563] [gelly] extended the run() method of GraphAlgorithm interface to return an arbitrary type Gelly's Graph Algorithm Interface is limited Key: FLINK-2563 URL: https://issues.apache.org/jira/browse/FLINK-2563 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Assignee: Vasia Kalavri Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to return the same type of Graph. public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception; In numerous cases, one needs to return a single value, or a modified graph. Off the top of my head, say one would like to implement a Triangle Count library method. That takes as input a Graph and returns the total number of triangles. https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java With the current Gelly abstractions, something like this cannot be supported. Also if I initially had a Graph of Long, Long, NullValue and my algorithm changed the edge values to type Double, for instance, I would again have created an implementation which is not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2548) In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset
[ https://issues.apache.org/jira/browse/FLINK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708364#comment-14708364 ] Gabor Gevay commented on FLINK-2548: OK, you are probably right. I ran some more tests, and it seems that the issue in my use case is more with the serialization. In other cases, when the serialization of the vertex IDs is cheaper, then the coGroup implementation does OK with respect to the run time of one iteration following the workset size. In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset Key: FLINK-2548 URL: https://issues.apache.org/jira/browse/FLINK-2548 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Gabor Gevay Assignee: Gabor Gevay Currently, the performance of vertex centric iteration is suboptimal in those iterations where the workset is small, because the complexity of one iteration contains the number of edges and vertices of the graph because of coGroups: VertexCentricIteration.buildMessagingFunction does a coGroup between the edges and the workset, to get the neighbors to the messaging UDF. This is problematic from a performance point of view, because the coGroup UDF gets called on all the edge groups, including those that are not getting any messages. An analogous problem is present in VertexCentricIteration.createResultSimpleVertex at the creation of the updates: a coGroup happens between the messages and the solution set, which has the number of vertices of the graph included in its complexity. Both of these coGroups could be avoided by doing a join instead (with the same keys that the coGroup uses), and then a groupBy. The complexity of these operations would be dominated by the size of the workset, as opposed to the number of edges or vertices of the graph. The joins should have the edges and the solution set at the build side to achieve this complexity. (They will not be rebuilt at every iteration.) I made some experiments with this, and the initial results seem promising. On some workloads, this achieves a 2 times speedup, because later iterations often have quite small worksets, and these get a huge speedup from this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2566) FlinkTopologyContext no populated completely
Matthias J. Sax created FLINK-2566: -- Summary: FlinkTopologyContext no populated completely Key: FLINK-2566 URL: https://issues.apache.org/jira/browse/FLINK-2566 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently FlinkTopologyContext is not populated completely. It only contains enough information to make WordCount example work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2563) Gelly's Graph Algorithm Interface is limited
[ https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708342#comment-14708342 ] Vasia Kalavri commented on FLINK-2563: -- That's a big limitation, I agree. Mind if I work on this [~andralungu]? Gelly's Graph Algorithm Interface is limited Key: FLINK-2563 URL: https://issues.apache.org/jira/browse/FLINK-2563 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to return the same type of Graph. public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception; In numerous cases, one needs to return a single value, or a modified graph. Off the top of my head, say one would like to implement a Triangle Count library method. That takes as input a Graph and returns the total number of triangles. https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java With the current Gelly abstractions, something like this cannot be supported. Also if I initially had a Graph of Long, Long, NullValue and my algorithm changed the edge values to type Double, for instance, I would again have created an implementation which is not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708658#comment-14708658 ] ASF GitHub Bot commented on FLINK-1901: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-133990256 Hi, @sachingoel0101 , while sample with fraction, it's not easy to verify whether the DataSet is sampled with input fraction. In the test, i take 5 times sample, use the average size to computer the result fraction, and then compare the result fraction with input fraction, verify their difference is not more than 10% percent. The following case may happens as well, Sampler sample the DataSet with input fraction, but the sampled result size is too small or too large that beyond our verification condition, it happens, just with very little possibility, say less than 0.001 in this test. it should be ok if this failure happens very occasionally, please let me know if you found it's not. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative or exact size of the sample, set a seed for reproducibility, and support sampling within iterations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method
[ https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708693#comment-14708693 ] Andreas Kunft edited comment on FLINK-2373 at 8/24/15 2:39 AM: --- Hey, i was just going to open a PR which had basically also another method for the remote environment with the extra configuration parameter, just like yours. So I guess, the PR is now obsolete (you can see my changes here: https://github.com/akunft/flink/commit/60240632ed71c072ecf880a586f12fd966412d67). As far as I see it, there is a difference in the configuration provided for the remote environment and the local one, as the remote config is only used for def. parallelism and the akka config for the job client and not for configuration of the cluster itself. The config for the local execution covers all the configuration. I think it should be stated clearly in the java doc, that the config is only for the jobclient and def. parallelism in case of the remote environment. was (Author: akunft): Hey, i was just going to open a PR which had basically also another method for the remote environment with the extra configuration parameter, just like yours. So I guess, the PR is now obsolete. As far as I see it, there is a difference in the configuration provided for the remote environment and the local one, as the remote config is only used for def. parallelism and the akka config for the job client and not for configuration of the cluster itself. The config for the local execution covers all the configuration. I think it should be stated clearly in the java doc, that the config is only for the jobclient and def. parallelism in case of the remote environment. Add configuration parameter to createRemoteEnvironment method - Key: FLINK-2373 URL: https://issues.apache.org/jira/browse/FLINK-2373 Project: Flink Issue Type: Bug Components: other Reporter: Andreas Kunft Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Currently there is no way to provide a custom configuration upon creation of a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)). This leads to errors when the submitted job exceeds the default value for the max. payload size in Akka, as we can not increase the configuration value (akka.remote.OversizedPayloadException: Discarding oversized payload...) Providing an overloaded method with a configuration parameter for the remote environment fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method
[ https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708693#comment-14708693 ] Andreas Kunft commented on FLINK-2373: -- Hey, i was just going to open a PR which had basically also another method for the remote environment with the extra configuration parameter, just like yours. So I guess, the PR is now obsolete. As far as I see it, there is a difference for the configuration provided for the remote environment and the local one, as the remote config is only used for def. parallelism and the akka config for the job client and not for configuration of the cluster itself, as the config for the local execution covers all the configuration. I think it should be stated clearly in the java doc, that the config is only for the jobclient and def. parallelism in case of the remote environment. Add configuration parameter to createRemoteEnvironment method - Key: FLINK-2373 URL: https://issues.apache.org/jira/browse/FLINK-2373 Project: Flink Issue Type: Bug Components: other Reporter: Andreas Kunft Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Currently there is no way to provide a custom configuration upon creation of a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)). This leads to errors when the submitted job exceeds the default value for the max. payload size in Akka, as we can not increase the configuration value (akka.remote.OversizedPayloadException: Discarding oversized payload...) Providing an overloaded method with a configuration parameter for the remote environment fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method
[ https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708693#comment-14708693 ] Andreas Kunft edited comment on FLINK-2373 at 8/24/15 2:29 AM: --- Hey, i was just going to open a PR which had basically also another method for the remote environment with the extra configuration parameter, just like yours. So I guess, the PR is now obsolete. As far as I see it, there is a difference in the configuration provided for the remote environment and the local one, as the remote config is only used for def. parallelism and the akka config for the job client and not for configuration of the cluster itself. The config for the local execution covers all the configuration. I think it should be stated clearly in the java doc, that the config is only for the jobclient and def. parallelism in case of the remote environment. was (Author: akunft): Hey, i was just going to open a PR which had basically also another method for the remote environment with the extra configuration parameter, just like yours. So I guess, the PR is now obsolete. As far as I see it, there is a difference for the configuration provided for the remote environment and the local one, as the remote config is only used for def. parallelism and the akka config for the job client and not for configuration of the cluster itself, as the config for the local execution covers all the configuration. I think it should be stated clearly in the java doc, that the config is only for the jobclient and def. parallelism in case of the remote environment. Add configuration parameter to createRemoteEnvironment method - Key: FLINK-2373 URL: https://issues.apache.org/jira/browse/FLINK-2373 Project: Flink Issue Type: Bug Components: other Reporter: Andreas Kunft Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Currently there is no way to provide a custom configuration upon creation of a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)). This leads to errors when the submitted job exceeds the default value for the max. payload size in Akka, as we can not increase the configuration value (akka.remote.OversizedPayloadException: Discarding oversized payload...) Providing an overloaded method with a configuration parameter for the remote environment fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708701#comment-14708701 ] ASF GitHub Bot commented on FLINK-1901: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-133999705 @ChengXiangLi I know that it is hard to verify random sampler implementation. But we need to fix this test failing because of difficulty of other pull requests verification. Some tests of other pull requests are failed by K-S test and sampling test with fraction. There is a [JIRA issue](https://issues.apache.org/jira/browse/FLINK-2564) covered this. I'm testing with increased count of samples and source size. If I get a notable result, I'll post the result. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative or exact size of the sample, set a seed for reproducibility, and support sampling within iterations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-133999705 @ChengXiangLi I know that it is hard to verify random sampler implementation. But we need to fix this test failing because of difficulty of other pull requests verification. Some tests of other pull requests are failed by K-S test and sampling test with fraction. There is a [JIRA issue](https://issues.apache.org/jira/browse/FLINK-2564) covered this. I'm testing with increased count of samples and source size. If I get a notable result, I'll post the result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-133990256 Hi, @sachingoel0101 , while sample with fraction, it's not easy to verify whether the DataSet is sampled with input fraction. In the test, i take 5 times sample, use the average size to computer the result fraction, and then compare the result fraction with input fraction, verify their difference is not more than 10% percent. The following case may happens as well, Sampler sample the DataSet with input fraction, but the sampled result size is too small or too large that beyond our verification condition, it happens, just with very little possibility, say less than 0.001 in this test. it should be ok if this failure happens very occasionally, please let me know if you found it's not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---