[jira] [Commented] (FLINK-3579) Improve String concatenation
[ https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186694#comment-15186694 ] Timo Walther commented on FLINK-3579: - Hey, thanks for your interest. I think no one is working on this issue right now, so you can work on it if you like. Should I assign it to you? Maybe this issue has the same problem as FLINK-3086. > Improve String concatenation > > > Key: FLINK-3579 > URL: https://issues.apache.org/jira/browse/FLINK-3579 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Timo Walther >Priority: Minor > > Concatenation of a String and non-String does not work properly. > e.g. {{f0 + 42}} leads to RelBuilder Exception > ExpressionParser does not like {{f0 + 42.cast(STRING)}} either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186662#comment-15186662 ] Vasia Kalavri commented on FLINK-1707: -- Hi Josep, no problem! I'll look into your comments. I'm guessing most of them will be resolved with a small example. Just ping me if you need help! -Vasia. > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2997) Support range partition with user customized data distribution.
[ https://issues.apache.org/jira/browse/FLINK-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186355#comment-15186355 ] ASF GitHub Bot commented on FLINK-2997: --- GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1776 [FLINK-2997] Support range partition with user customized data distribution. Sometime user have better knowledge of the source data, and they can build customized `data distribution` to do range partition more efficiently. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink flink-2997 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1776.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1776 commit 1e00767da3e83d688847ffe5c4ab77cab68f2062 Author: gallenvaraDate: 2016-01-27T17:34:10Z Enable range partition with custom data distribution. > Support range partition with user customized data distribution. > --- > > Key: FLINK-2997 > URL: https://issues.apache.org/jira/browse/FLINK-2997 > Project: Flink > Issue Type: New Feature >Reporter: Chengxiang Li > > This is a followup work of FLINK-7, sometime user have better knowledge of > the source data, and they can build customized data distribution to do range > partition more efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2997] Support range partition with user...
GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1776 [FLINK-2997] Support range partition with user customized data distribution. Sometime user have better knowledge of the source data, and they can build customized `data distribution` to do range partition more efficiently. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink flink-2997 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1776.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1776 commit 1e00767da3e83d688847ffe5c4ab77cab68f2062 Author: gallenvaraDate: 2016-01-27T17:34:10Z Enable range partition with custom data distribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3207) Add a Pregel iteration abstraction to Gelly
[ https://issues.apache.org/jira/browse/FLINK-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185806#comment-15185806 ] ASF GitHub Bot commented on FLINK-3207: --- Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1575#issuecomment-193968765 Hi @vasia this is a nice PR, I commented some minor documentation / naming issues +1 for merging > Add a Pregel iteration abstraction to Gelly > --- > > Key: FLINK-3207 > URL: https://issues.apache.org/jira/browse/FLINK-3207 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > This issue proposes to add a Pregel/Giraph-like iteration abstraction to > Gelly that will only expose one UDF to the user, {{compute()}}. {{compute()}} > will have access to both the vertex state and the incoming messages, and will > be able to produce messages and update the vertex value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3207] [gelly] adds the vertex-centric i...
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1575#issuecomment-193968765 Hi @vasia this is a nice PR, I commented some minor documentation / naming issues +1 for merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185789#comment-15185789 ] ASF GitHub Bot commented on FLINK-1745: --- Github user danielblazevski closed the pull request at: https://github.com/apache/flink/pull/1220 > Add exact k-nearest-neighbours algorithm to machine learning library > > > Key: FLINK-1745 > URL: https://issues.apache.org/jira/browse/FLINK-1745 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Daniel Blazevski > Labels: ML, Starter > > Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial > it is still used as a mean to classify data and to do regression. This issue > focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as > proposed in [2]. > Could be a starter task. > Resources: > [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] > [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185790#comment-15185790 ] ASF GitHub Bot commented on FLINK-1745: --- GitHub user danielblazevski reopened a pull request: https://github.com/apache/flink/pull/1220 [FLINK-1745] Add exact k-nearest-neighbours algorithm to machine learning library I added a quadtree data structure for the knn algorithm. @chiwanpark made originally made a pull request for a kNN algorithm, and we coordinated so that I incorporate a tree structure. The quadtree scales very well with the number of training + test points, but scales poorly with the dimension (even the R-tree scales poorly with the dimension). I added a flag that is automatically determines whether or not to use the quadtree. My implementation needed to use the Euclidean or SquaredEuclidean distance since I needed a specific notion of the distance between a test point and a box in the quadtree. I added another test KNNQuadTreeSuite in addition to Chiwan Park's KNNITSuite, since C. Park's parameters will automatically choose the brute-force non-quadtree method. For more details on the quadtree + how I used it for the KNN query, please see another branch I created that has a README.md: https://github.com/danielblazevski/flink/tree/FLINK-1745-devel/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn You can merge this pull request into a Git repository by running: $ git pull https://github.com/danielblazevski/flink FLINK-1745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1220 commit c7e5056c6d273f6f0f841f77e0fdd91ca221602d Author: Chiwan ParkDate: 2015-06-30T08:41:25Z [FLINK-1745] [ml] Add exact k-nearest-neighbor join commit 9d0c7942c09086324fadb29bdce749683a0d1a7e Author: danielblazevski Date: 2015-09-15T21:49:05Z modified kNN test to familiarize with Flink and KNN.scala commit 611248e57166dc549f86f805b590dd4e45cb3df5 Author: danielblazevski Date: 2015-09-15T21:49:17Z modified kNN test to familiarize with Flink and KNN.scala commit 1fd8231ce194b52b5a1bd55bbc5e135b3fa5775b Author: danielblazevski Date: 2015-09-16T01:26:57Z nightly commit, minor changes: got the filter to work, working on mapping the training set to include box lables commit 15d7d2cb308b23e24c43d103b85a76b0e665cbd3 Author: danielblazevski Date: 2015-09-22T02:02:51Z commit before incporporating quadtree commit 8f2da8a66516565c59df8828de2715b45397cb7f Author: danielblazevski Date: 2015-09-22T15:49:25Z did a basic import of QuadTree and Test; to-do: modify QuadTree to allow KNN.scala to make use of commit e1cef2c5aea65c6f204caeff6348e2778231f98d Author: danielblazevski Date: 2015-09-22T21:03:04Z transfered ListBuffers for objects in leaf nodes to Vectors commit c3387ef2ef59734727b56ea652fdb29af957d20b Author: danielblazevski Date: 2015-09-23T00:41:29Z basic test on 2D unit box seems to work -- need to generalize, e.g. to include automated bounding box commit 48294ff37a5f800e5111280da5a3c03f4375028d Author: danielblazevski Date: 2015-09-23T15:03:06Z had to debug quadtree -- back to testing 2D commit 6403ba14e240ed8d67a296ac789e7e00dece800d Author: danielblazevski Date: 2015-09-23T15:22:46Z Testing 2D looks good, strong improvement in run time compared to brute-force method commit 426466a40bc2625f390fe0d912f56a346e46c8f8 Author: danielblazevski Date: 2015-09-23T19:04:52Z added automated detection of bounding box based on min/max values of both training and test sets commit c35543b828384aa4ce04d56dfcb3d73db46d1e6d Author: danielblazevski Date: 2015-09-24T00:28:56Z added automated radius about test point to define localized neighborhood, result runs. TO-DO: Lots of tests commit 8e2d2e78f8533d4192aebe9b4baa7efbfa5928a5 Author: danielblazevski Date: 2015-09-24T00:54:06Z Note for future: previous commit passed test of Chiwan Park had in intial knn implementation commit d6fd40cb88d6e198e52c368e829bf7d32d432081 Author: danielblazevski Date: 2015-09-24T01:56:38Z Note for future: previous commit passed 3D version of the test that Chiwan Park had in the intial knn implementation commit 0ec1f4866157ca073341672e7fe9a50871ac0b7c Author:
[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...
GitHub user danielblazevski reopened a pull request: https://github.com/apache/flink/pull/1220 [FLINK-1745] Add exact k-nearest-neighbours algorithm to machine learning library I added a quadtree data structure for the knn algorithm. @chiwanpark made originally made a pull request for a kNN algorithm, and we coordinated so that I incorporate a tree structure. The quadtree scales very well with the number of training + test points, but scales poorly with the dimension (even the R-tree scales poorly with the dimension). I added a flag that is automatically determines whether or not to use the quadtree. My implementation needed to use the Euclidean or SquaredEuclidean distance since I needed a specific notion of the distance between a test point and a box in the quadtree. I added another test KNNQuadTreeSuite in addition to Chiwan Park's KNNITSuite, since C. Park's parameters will automatically choose the brute-force non-quadtree method. For more details on the quadtree + how I used it for the KNN query, please see another branch I created that has a README.md: https://github.com/danielblazevski/flink/tree/FLINK-1745-devel/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn You can merge this pull request into a Git repository by running: $ git pull https://github.com/danielblazevski/flink FLINK-1745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1220 commit c7e5056c6d273f6f0f841f77e0fdd91ca221602d Author: Chiwan ParkDate: 2015-06-30T08:41:25Z [FLINK-1745] [ml] Add exact k-nearest-neighbor join commit 9d0c7942c09086324fadb29bdce749683a0d1a7e Author: danielblazevski Date: 2015-09-15T21:49:05Z modified kNN test to familiarize with Flink and KNN.scala commit 611248e57166dc549f86f805b590dd4e45cb3df5 Author: danielblazevski Date: 2015-09-15T21:49:17Z modified kNN test to familiarize with Flink and KNN.scala commit 1fd8231ce194b52b5a1bd55bbc5e135b3fa5775b Author: danielblazevski Date: 2015-09-16T01:26:57Z nightly commit, minor changes: got the filter to work, working on mapping the training set to include box lables commit 15d7d2cb308b23e24c43d103b85a76b0e665cbd3 Author: danielblazevski Date: 2015-09-22T02:02:51Z commit before incporporating quadtree commit 8f2da8a66516565c59df8828de2715b45397cb7f Author: danielblazevski Date: 2015-09-22T15:49:25Z did a basic import of QuadTree and Test; to-do: modify QuadTree to allow KNN.scala to make use of commit e1cef2c5aea65c6f204caeff6348e2778231f98d Author: danielblazevski Date: 2015-09-22T21:03:04Z transfered ListBuffers for objects in leaf nodes to Vectors commit c3387ef2ef59734727b56ea652fdb29af957d20b Author: danielblazevski Date: 2015-09-23T00:41:29Z basic test on 2D unit box seems to work -- need to generalize, e.g. to include automated bounding box commit 48294ff37a5f800e5111280da5a3c03f4375028d Author: danielblazevski Date: 2015-09-23T15:03:06Z had to debug quadtree -- back to testing 2D commit 6403ba14e240ed8d67a296ac789e7e00dece800d Author: danielblazevski Date: 2015-09-23T15:22:46Z Testing 2D looks good, strong improvement in run time compared to brute-force method commit 426466a40bc2625f390fe0d912f56a346e46c8f8 Author: danielblazevski Date: 2015-09-23T19:04:52Z added automated detection of bounding box based on min/max values of both training and test sets commit c35543b828384aa4ce04d56dfcb3d73db46d1e6d Author: danielblazevski Date: 2015-09-24T00:28:56Z added automated radius about test point to define localized neighborhood, result runs. TO-DO: Lots of tests commit 8e2d2e78f8533d4192aebe9b4baa7efbfa5928a5 Author: danielblazevski Date: 2015-09-24T00:54:06Z Note for future: previous commit passed test of Chiwan Park had in intial knn implementation commit d6fd40cb88d6e198e52c368e829bf7d32d432081 Author: danielblazevski Date: 2015-09-24T01:56:38Z Note for future: previous commit passed 3D version of the test that Chiwan Park had in the intial knn implementation commit 0ec1f4866157ca073341672e7fe9a50871ac0b7c Author: danielblazevski Date: 2015-09-24T14:27:20Z changed filename of QuadTreeTest to QuadTreeSuite, about to make test more comprehensive and similar format to other Flink tests commit
[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...
Github user danielblazevski closed the pull request at: https://github.com/apache/flink/pull/1220 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185785#comment-15185785 ] ASF GitHub Bot commented on FLINK-1745: --- Github user danielblazevski commented on the pull request: https://github.com/apache/flink/pull/1220#issuecomment-193964375 Hi @chiwanpark, I modified the tests and corrected the package + import statements, please have a look. I will add more details soon > Add exact k-nearest-neighbours algorithm to machine learning library > > > Key: FLINK-1745 > URL: https://issues.apache.org/jira/browse/FLINK-1745 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Daniel Blazevski > Labels: ML, Starter > > Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial > it is still used as a mean to classify data and to do regression. This issue > focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as > proposed in [2]. > Could be a starter task. > Resources: > [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] > [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...
Github user danielblazevski commented on the pull request: https://github.com/apache/flink/pull/1220#issuecomment-193964375 Hi @chiwanpark, I modified the tests and corrected the package + import statements, please have a look. I will add more details soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3574) Implement math functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185644#comment-15185644 ] ASF GitHub Bot commented on FLINK-3574: --- GitHub user dawidwys opened a pull request: https://github.com/apache/flink/pull/1775 [FLINK-3574] Implement math functions for Table API You can merge this pull request into a Git repository by running: $ git pull https://github.com/dawidwys/flink mathFunctions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1775 commit 1ff3a297ae2bb2d0733b3fca68bd91c54301f556 Author: dawidDate: 2016-03-08T19:47:42Z [FLINK-3574] Implement math functions for Table API > Implement math functions for Table API > -- > > Key: FLINK-3574 > URL: https://issues.apache.org/jira/browse/FLINK-3574 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Dawid Wysakowicz > > {code} > MOD > EXP > POWER > LN > LOG10 > ABS > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3574] Implement math functions for Tabl...
GitHub user dawidwys opened a pull request: https://github.com/apache/flink/pull/1775 [FLINK-3574] Implement math functions for Table API You can merge this pull request into a Git repository by running: $ git pull https://github.com/dawidwys/flink mathFunctions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1775 commit 1ff3a297ae2bb2d0733b3fca68bd91c54301f556 Author: dawidDate: 2016-03-08T19:47:42Z [FLINK-3574] Implement math functions for Table API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-3593) DistinctITCase is failing
[ https://issues.apache.org/jira/browse/FLINK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske reassigned FLINK-3593: Assignee: Fabian Hueske > DistinctITCase is failing > - > > Key: FLINK-3593 > URL: https://issues.apache.org/jira/browse/FLINK-3593 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Vasia Kalavri >Assignee: Fabian Hueske > Fix For: 1.1.0 > > > Failed tests: > DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] > DistinctITCase.testDistinct:53 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] > DistinctITCase.testDistinct:53 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] > DistinctITCase.testDistinct:44 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] > DistinctITCase.testDistinct:44 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3593) DistinctITCase is failing
[ https://issues.apache.org/jira/browse/FLINK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-3593. -- Resolution: Fixed Fix Version/s: 1.1.0 Fixed on tableOnCalcite branch > DistinctITCase is failing > - > > Key: FLINK-3593 > URL: https://issues.apache.org/jira/browse/FLINK-3593 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Vasia Kalavri >Assignee: Fabian Hueske > Fix For: 1.1.0 > > > Failed tests: > DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] > DistinctITCase.testDistinct:53 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] > DistinctITCase.testDistinct:53 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] > DistinctITCase.testDistinct:44 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] > DistinctITCase.testDistinct:44 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3593) DistinctITCase is failing
[ https://issues.apache.org/jira/browse/FLINK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-3593: - Description: Failed tests: DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while applying rule DataSetAggregateRule, args [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while applying rule DataSetAggregateRule, args [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] DistinctITCase.testDistinct:53 Internal error: Error while applying rule DataSetAggregateRule, args [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] DistinctITCase.testDistinct:53 Internal error: Error while applying rule DataSetAggregateRule, args [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while applying rule DataSetAggregateRule, args [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while applying rule DataSetAggregateRule, args [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] DistinctITCase.testDistinct:44 Internal error: Error while applying rule DataSetAggregateRule, args [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] DistinctITCase.testDistinct:44 Internal error: Error while applying rule DataSetAggregateRule, args [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] was: The {{DistinctITCase}} test is failing on the {{tableOnCalcite}} branch: {code} Failed tests: DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while applying rule DataSetAggregateRule, args [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while applying rule DataSetAggregateRule, args [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] DistinctITCase.testDistinct:53 Internal error: Error while applying rule DataSetAggregateRule, args [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] DistinctITCase.testDistinct:53 Internal error: Error while applying rule DataSetAggregateRule, args [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while applying rule DataSetAggregateRule, args [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while applying rule DataSetAggregateRule, args [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] DistinctITCase.testDistinct:44 Internal error: Error while applying rule DataSetAggregateRule, args [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] DistinctITCase.testDistinct:44 Internal error: Error while applying rule DataSetAggregateRule, args [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] {code} > DistinctITCase is failing > - > > Key: FLINK-3593 > URL: https://issues.apache.org/jira/browse/FLINK-3593 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Vasia Kalavri > > Failed tests: > DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] > DistinctITCase.testDistinct:53 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] > DistinctITCase.testDistinct:53 Internal error: Error while applying rule > DataSetAggregateRule, args > [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] > DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while > applying rule DataSetAggregateRule, args > [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] > DistinctITCase.testDistinct:44 Internal error: Error while applying rule > DataSetAggregateRule, args >
[jira] [Created] (FLINK-3593) DistinctITCase is failing
Vasia Kalavri created FLINK-3593: Summary: DistinctITCase is failing Key: FLINK-3593 URL: https://issues.apache.org/jira/browse/FLINK-3593 Project: Flink Issue Type: Bug Components: Table API Reporter: Vasia Kalavri The {{DistinctITCase}} test is failing on the {{tableOnCalcite}} branch: {code} Failed tests: DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while applying rule DataSetAggregateRule, args [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while applying rule DataSetAggregateRule, args [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] DistinctITCase.testDistinct:53 Internal error: Error while applying rule DataSetAggregateRule, args [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] DistinctITCase.testDistinct:53 Internal error: Error while applying rule DataSetAggregateRule, args [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while applying rule DataSetAggregateRule, args [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})] DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while applying rule DataSetAggregateRule, args [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})] DistinctITCase.testDistinct:44 Internal error: Error while applying rule DataSetAggregateRule, args [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})] DistinctITCase.testDistinct:44 Internal error: Error while applying rule DataSetAggregateRule, args [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3592) Update setup quickstart
[ https://issues.apache.org/jira/browse/FLINK-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-3592. -- Resolution: Fixed Implemented in bb9c1fa (release-1.0) and be1cf9d (master). > Update setup quickstart > --- > > Key: FLINK-3592 > URL: https://issues.apache.org/jira/browse/FLINK-3592 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > The setup quickstart is outdated and shows a batch example instead of a > streaming one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3592) Update setup quickstart
Ufuk Celebi created FLINK-3592: -- Summary: Update setup quickstart Key: FLINK-3592 URL: https://issues.apache.org/jira/browse/FLINK-3592 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ufuk Celebi Assignee: Ufuk Celebi The setup quickstart is outdated and shows a batch example instead of a streaming one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek resolved FLINK-3591. - Resolution: Fixed Fix Version/s: 1.0.1 1.1.0 Changed in https://github.com/apache/flink/commit/35ad1972d1604b4f69a9bfb12f00c280ad6262f8 > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > Fix For: 1.1.0, 1.0.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185154#comment-15185154 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193844948 Alright, I merged it with the fixes. > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185155#comment-15185155 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/1774 > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193844948 Alright, I merged it with the fixes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/1774 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3579) Improve String concatenation
[ https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185140#comment-15185140 ] ramkrishna.s.vasudevan commented on FLINK-3579: --- Is it possible for me to work on this JIRA, if you are going to take it immediately [~twalthr]? And also if this JIRA is not so important for your Table work to progress? > Improve String concatenation > > > Key: FLINK-3579 > URL: https://issues.apache.org/jira/browse/FLINK-3579 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Timo Walther >Priority: Minor > > Concatenation of a String and non-String does not work properly. > e.g. {{f0 + 42}} leads to RelBuilder Exception > ExpressionParser does not like {{f0 + 42.cast(STRING)}} either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185107#comment-15185107 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193828321 Very very very nice! Thanks for doing this! I like it a lot. I was wondering whether we want to have a Scala version as well. What do you think? We can do it as a follow up if you don't have time for it now. > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185112#comment-15185112 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193830427 Thanks @uce as well! :smiley: I incorporated all the fixes. The only thing I'm not happy with is the section with the window. I feel that I can't do it without some small explanation but it also can't be to big/in-depth. Scala I would do as a follow-up. > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193830427 Thanks @uce as well! :smiley: I incorporated all the fixes. The only thing I'm not happy with is the section with the window. I feel that I can't do it without some small explanation but it also can't be to big/in-depth. Scala I would do as a follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2444) Add tests for HadoopInputFormats
[ https://issues.apache.org/jira/browse/FLINK-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler updated FLINK-2444: Assignee: Martin Liesenberg > Add tests for HadoopInputFormats > > > Key: FLINK-2444 > URL: https://issues.apache.org/jira/browse/FLINK-2444 > Project: Flink > Issue Type: Test > Components: Hadoop Compatibility, Tests >Affects Versions: 0.9.0, 0.10.0 >Reporter: Fabian Hueske >Assignee: Martin Liesenberg > Labels: starter > > The HadoopInputFormats and HadoopInputFormatBase classes are not sufficiently > covered by unit tests. > We need tests that ensure that the methods of the wrapped Hadoop InputFormats > are correctly called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193828321 Very very very nice! Thanks for doing this! I like it a lot. I was wondering whether we want to have a Scala version as well. What do you think? We can do it as a follow up if you don't have time for it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376606 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185103#comment-15185103 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376606 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185080#comment-15185080 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375032 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running --- End diff -- go from to? > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375307 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can --- End diff -- Wikipedia capitalization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185099#comment-15185099 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376436 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376436 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376365 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185097#comment-15185097 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376360 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185098#comment-15185098 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376365 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55376360 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375032 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running --- End diff -- go from to? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2445) Add tests for HadoopOutputFormats
[ https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler updated FLINK-2445: Assignee: Martin Liesenberg > Add tests for HadoopOutputFormats > - > > Key: FLINK-2445 > URL: https://issues.apache.org/jira/browse/FLINK-2445 > Project: Flink > Issue Type: Test > Components: Hadoop Compatibility, Tests >Affects Versions: 0.9.1, 0.10.0 >Reporter: Fabian Hueske >Assignee: Martin Liesenberg > Labels: starter > > The HadoopOutputFormats and HadoopOutputFormatBase classes are not > sufficiently covered by unit tests. > We need tests that ensure that the methods of the wrapped Hadoop > OutputFormats are correctly called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185087#comment-15185087 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375679 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[jira] [Closed] (FLINK-2315) Hadoop Writables cannot exploit implementing NormalizableKey
[ https://issues.apache.org/jira/browse/FLINK-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-2315. --- Resolution: Invalid > Hadoop Writables cannot exploit implementing NormalizableKey > > > Key: FLINK-2315 > URL: https://issues.apache.org/jira/browse/FLINK-2315 > Project: Flink > Issue Type: Bug > Components: DataSet API >Affects Versions: 0.9 >Reporter: Stephan Ewen > Fix For: 1.0.0 > > > When one implements a type that extends {{hadoop.io.Writable}} and it > implements {{NormalizableKey}}, this is still not exploited. > The Writable comparator fails to recognize that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185090#comment-15185090 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375868 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375868 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185084#comment-15185084 ] ASF GitHub Bot commented on FLINK-3591: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375307 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can --- End diff -- Wikipedia capitalization > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375144 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: --- End diff -- Maven capitalization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185070#comment-15185070 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55374388 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55374388 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185065#comment-15185065 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193822714 Thanks for the thorough review @vasia, that was quick :smile: > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185064#comment-15185064 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55373645 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193822714 Thanks for the thorough review @vasia, that was quick :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55373645 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185060#comment-15185060 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193821304 Just a few misspellings. Otherwise looks great! > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1774#issuecomment-193821304 Just a few misspellings. Otherwise looks great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Improve TaskManagerTest#testRemotePartitionNot...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1658 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185055#comment-15185055 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55372164 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55372164 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185051#comment-15185051 ] ASF GitHub Bot commented on FLINK-3591: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371997 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371996 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185045#comment-15185045 ] Josep Rubió commented on FLINK-1707: Hi Vasia, First sorry for my delay, I had no much time last weeks. I tried to give a response to your questions, I don't know if I gave you enough info. I'll try to work in it this week/weekend. I have some doubts about how to store the vertex state, I'll write a section explaining what I did, but basically is using flink tuples. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371997 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185050#comment-15185050 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371996 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: Improve TaskManagerTest#testRemotePartitionNot...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1658#issuecomment-193817912 merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185042#comment-15185042 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371459 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371429 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371459 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185039#comment-15185039 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371362 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1774 [FLINK-3591] Replace Quickstart K-Means Example by Streaming Example I attached a rough pdf rendering of the changed page. [Apache Flink 1.1-SNAPSHOT Documentation: Quick Start: Monitoring the Wikipedia Edit Stream.pdf](https://github.com/apache/flink/files/163347/Apache.Flink.1.1-SNAPSHOT.Documentation.Quick.Start.Monitoring.the.Wikipedia.Edit.Stream.pdf) You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink doc-fix-quickstart-example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1774.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1774 commit ce53fa70512d7817aed64ec30bd056567c7c4f55 Author: Aljoscha KrettekDate: 2016-03-08T14:49:09Z [FLINK-3591] Replace Quickstart K-Means Example by Streaming Example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371362 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185041#comment-15185041 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371429 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[jira] [Commented] (FLINK-3444) env.fromElements relies on the first input element for determining the DataSet/DataStream type
[ https://issues.apache.org/jira/browse/FLINK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185030#comment-15185030 ] Chesnay Schepler commented on FLINK-3444: - We could add a fromElements method that takes a TypeInformation or Class argument in addition to the values. Should be the easiest and safest way to solve this issue. > env.fromElements relies on the first input element for determining the > DataSet/DataStream type > -- > > Key: FLINK-3444 > URL: https://issues.apache.org/jira/browse/FLINK-3444 > Project: Flink > Issue Type: Bug > Components: DataSet API, DataStream API >Affects Versions: 0.10.0, 1.0.0 >Reporter: Vasia Kalavri > > The {{fromElements}} method of the {{ExecutionEnvironment}} and > {{StreamExecutionEnvironment}} determines the DataSet/DataStream type by > extracting the type of the first input element. > This is problematic if the first element is a subtype of another element in > the collection. > For example, the following > {code} > DataStream input = env.fromElements(new Event(1, "a"), new SubEvent(2, > "b")); > {code} > succeeds, while the following > {code} > DataStream input = env.fromElements(new SubEvent(1, "a"), new Event(2, > "b")); > {code} > fails with "java.lang.IllegalArgumentException: The elements in the > collection are not all subclasses of SubEvent". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3519) Subclasses of Tuples don't work if the declared type of a DataSet is not the descendant
[ https://issues.apache.org/jira/browse/FLINK-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler updated FLINK-3519: Assignee: Gabor Gevay > Subclasses of Tuples don't work if the declared type of a DataSet is not the > descendant > --- > > Key: FLINK-3519 > URL: https://issues.apache.org/jira/browse/FLINK-3519 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.0.0 >Reporter: Gabor Gevay >Assignee: Gabor Gevay >Priority: Minor > > If I have a subclass of TupleN, then objects of this type will turn into > TupleNs when I try to use them in a DataSet. > For example, if I have a class like this: > {code} > public static class Foo extends Tuple1 { > public short a; > public Foo() {} > public Foo(int f0, int a) { > this.f0 = f0; > this.a = (short)a; > } > @Override > public String toString() { > return "(" + f0 + ", " + a + ")"; > } > } > {code} > And then I do this: > {code} > env.fromElements(0,0,0).map(new MapFunction() { > @Override > public Tuple1 map(Integer value) throws Exception { > return new Foo(5, 6); > } > }).print(); > {code} > Then I don't have Foos in the output, but only Tuples: > {code} > (5) > (5) > (5) > {code} > The problem is caused by the TupleSerializer not caring about subclasses at > all. I guess the reason for this is performance: we don't want to deal with > writing and reading subclass tags when we have Tuples. > I see three options for solving this: > 1. Add subclass tags to the TupleSerializer: This is not really an option, > because we don't want to loose performance. > 2. Document this behavior in the javadoc of the Tuple classes. > 3. Make the Tuple types final: this would be the clean solution, but it is > API breaking, and the first victim would be Gelly: the Vertex and Edge types > extend from tuples. (Note that the issue doesn't appear there, because the > DataSets there always have the type of the descendant class.) > When deciding between 2. and 3., an important point to note is that if you > have your class extend from a Tuple type instead of just adding the f0, f1, > ... fields manually in the hopes of getting the performance boost associated > with Tuples, then you are out of luck: the PojoSerializer will kick in anyway > when the declared types of your DataSets are the descendant type. > If someone knows about a good reason to extend from a Tuple class, then > please comment. > For 2., this is a suggested wording for the javadoc of the Tuple classes: > Warning: Please don't subclass Tuple classes, but if you do, then be sure to > always declare the element type of your DataSets to your descendant type. > (That is, if you have a "class A extends Tuple2", then don't use instances of > A in a DataSet, but use DataSet.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185032#comment-15185032 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371080 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55371080 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185026#comment-15185026 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55370936 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{%
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55370936 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + +public static void main(String[] args) throws Exception { + +} +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream edits =
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185018#comment-15185018 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55370299 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src +└── main +├── java +│ └── wikiedits +│ ├── Job.java +│ ├── SocketTextStreamWordCount.java +│ └── WordCount.java +└── resources +└── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: --- End diff -- src/main/java/...? > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55370299 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ +-DarchetypeGroupId=org.apache.flink\ +-DarchetypeArtifactId=flink-quickstart-java\ +-DarchetypeVersion=1.0.0\ +-DgroupId=wiki-edits\ +-DartifactId=wiki-edits\ +-Dversion=0.1\ +-Dpackage=wikiedits\ +-DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +âââ pom.xml +âââ src +âââ main +âââ java +â  âââ wikiedits +â  âââ Job.java +â  âââ SocketTextStreamWordCount.java +â  âââ WordCount.java +âââ resources +âââ log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} + + +org.apache.flink +flink-java +${flink.version} + + +org.apache.flink +flink-streaming-java_2.10 +${flink.version} + + +org.apache.flink +flink-clients_2.10 +${flink.version} + + +org.apache.flink +flink-connector-wikiedits_2.10 +${flink.version} + + +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: --- End diff -- src/main/java/...? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55369437 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running --- End diff -- fo => go --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185009#comment-15185009 ] ASF GitHub Bot commented on FLINK-3591: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55369437 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running --- End diff -- fo => go > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185000#comment-15185000 ] ASF GitHub Bot commented on FLINK-3591: --- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1774 [FLINK-3591] Replace Quickstart K-Means Example by Streaming Example I attached a rough pdf rendering of the changed page. [Apache Flink 1.1-SNAPSHOT Documentation: Quick Start: Monitoring the Wikipedia Edit Stream.pdf](https://github.com/apache/flink/files/163347/Apache.Flink.1.1-SNAPSHOT.Documentation.Quick.Start.Monitoring.the.Wikipedia.Edit.Stream.pdf) You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink doc-fix-quickstart-example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1774.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1774 commit ce53fa70512d7817aed64ec30bd056567c7c4f55 Author: Aljoscha KrettekDate: 2016-03-08T14:49:09Z [FLINK-3591] Replace Quickstart K-Means Example by Streaming Example > Replace Quickstart K-Means Example by Streaming Example > --- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example
Aljoscha Krettek created FLINK-3591: --- Summary: Replace Quickstart K-Means Example by Streaming Example Key: FLINK-3591 URL: https://issues.apache.org/jira/browse/FLINK-3591 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3590) JDBC Format tests don't hide derby logs
[ https://issues.apache.org/jira/browse/FLINK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184936#comment-15184936 ] ASF GitHub Bot commented on FLINK-3590: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1773 [FLINK-3590] Hide derby log in JDBC tests Hides derby logs by redirecting the log stream to a dummy stream You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 3590_jdbc_derby Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1773 commit c998b29391c08ddd359fdaf0574c063e48773678 Author: zentolDate: 2016-03-08T14:02:14Z [FLINK-3590] Hide derby log in JDBC tests > JDBC Format tests don't hide derby logs > --- > > Key: FLINK-3590 > URL: https://issues.apache.org/jira/browse/FLINK-3590 > Project: Flink > Issue Type: Improvement > Components: Batch, Tests >Affects Versions: 1.0.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Minor > Fix For: 1.1.0 > > > The batch jdbc IO-format tests print the derby log, due to some refactoring > going wrong. > {code} > T E S T S > --- > Running org.apache.flink.api.java.io.jdbc.JDBCInputFormatTest > Tue Mar 08 12:17:03 UTC 2016 Thread[main,5,main] > java.lang.ClassNotFoundException: > org.apache.flink.api.java.record.io.jdbc.DevNullLogStream > > Tue Mar 08 12:17:03 UTC 2016: > Booting Derby version The Apache Software Foundation - Apache Derby - > 10.10.1.1 - (1458268): instance a816c00e-0153-5628-ba57-0d988ed0 > on database directory > memory:/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc/ebookshop > with class loader sun.misc.Launcher$AppClassLoader@788bf135 > Loaded from > file:/home/travis/.m2/repository/org/apache/derby/derby/10.10.1.1/derby-10.10.1.1.jar > java.vendor=Oracle Corporation > java.runtime.version=1.7.0_76-b13 > user.dir=/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc > os.name=Linux > os.arch=amd64 > os.version=3.13.0-40-generic > derby.system.home=null > derby.stream.error.field=org.apache.flink.api.java.record.io.jdbc.DevNullLogStream.DEV_NULL > Database Class Loader started - derby.database.classpath='' > {code} > We should hide these logs again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3590] Hide derby log in JDBC tests
GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/1773 [FLINK-3590] Hide derby log in JDBC tests Hides derby logs by redirecting the log stream to a dummy stream You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 3590_jdbc_derby Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1773 commit c998b29391c08ddd359fdaf0574c063e48773678 Author: zentolDate: 2016-03-08T14:02:14Z [FLINK-3590] Hide derby log in JDBC tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3590) JDBC Format tests don't hide derby logs
Chesnay Schepler created FLINK-3590: --- Summary: JDBC Format tests don't hide derby logs Key: FLINK-3590 URL: https://issues.apache.org/jira/browse/FLINK-3590 Project: Flink Issue Type: Improvement Components: Batch, Tests Affects Versions: 1.0.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Priority: Minor Fix For: 1.1.0 The batch jdbc IO-format tests print the derby log, due to some refactoring going wrong. {code} T E S T S --- Running org.apache.flink.api.java.io.jdbc.JDBCInputFormatTest Tue Mar 08 12:17:03 UTC 2016 Thread[main,5,main] java.lang.ClassNotFoundException: org.apache.flink.api.java.record.io.jdbc.DevNullLogStream Tue Mar 08 12:17:03 UTC 2016: Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 - (1458268): instance a816c00e-0153-5628-ba57-0d988ed0 on database directory memory:/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc/ebookshop with class loader sun.misc.Launcher$AppClassLoader@788bf135 Loaded from file:/home/travis/.m2/repository/org/apache/derby/derby/10.10.1.1/derby-10.10.1.1.jar java.vendor=Oracle Corporation java.runtime.version=1.7.0_76-b13 user.dir=/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc os.name=Linux os.arch=amd64 os.version=3.13.0-40-generic derby.system.home=null derby.stream.error.field=org.apache.flink.api.java.record.io.jdbc.DevNullLogStream.DEV_NULL Database Class Loader started - derby.database.classpath='' {code} We should hide these logs again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3574) Implement math functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184935#comment-15184935 ] Dawid Wysakowicz commented on FLINK-3574: - Ok, I understand now. I am gonna implement it that way later today. > Implement math functions for Table API > -- > > Key: FLINK-3574 > URL: https://issues.apache.org/jira/browse/FLINK-3574 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Dawid Wysakowicz > > {code} > MOD > EXP > POWER > LN > LOG10 > ABS > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3589) Allow setting Operator parallelism to default value
Greg Hogan created FLINK-3589: - Summary: Allow setting Operator parallelism to default value Key: FLINK-3589 URL: https://issues.apache.org/jira/browse/FLINK-3589 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 1.1.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Minor User's can override the parallelism for a single operator by calling {{Operator.setParallelism}}, which accepts a positive value. {{Operator}} uses {{-1}} to indicate default parallelism. It would be nice to name and accept this default value. This would enable user algorithms to allow configurable parallelism while still chaining operator methods. For example, currently: {code} private int parallelism; ... public void setParallelism(int parallelism) { this.parallelism = parallelism; } ... MapOperator, Edge > newEdges = edges .map(new MyMapFunction()) .name("My map function"); if (parallelism > 0) { newEdges.setParallelism(parallelism); } {code} Could be simplified to: {code} private int parallelism = Operator.DEFAULT_PARALLELISM; ... public void setParallelism(int parallelism) { this.parallelism = parallelism; } ... DataSet > newEdges = edges .map(new MyMapFunction()) .setParallelism(parallelism) .name("My map function"); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-3574) Implement math functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184889#comment-15184889 ] Dawid Wysakowicz edited comment on FLINK-3574 at 3/8/16 1:11 PM: - I thought about a logic as described by you in {{getCallGenerator}}, but right know we can't specify signature with NUMERIC type cause {{NumericTypeInfo}} is an abstract class thus it cannot be instantiated. To give an example I would like to write sth like, which unfortunately is impossible: {code} addSqlFunctionMethod( LOG10, Seq(NUMERIC_TYPE_INFO), DOUBLE_TYPE_INFO, BuiltInMethod.Log10.method) {code} Regarding the two ways of processing I meant sth different. Have a look at those two code snippets: RexNodeTranslator: {code} case UnaryMinus(child) => val c = toRexNode(child, relBuilder) relBuilder.call(SqlStdOperatorTable.UNARY_MINUS, c) // Scalar functions case Call(name, args@_*) => val rexArgs = args.map(toRexNode(_, relBuilder)) val sqlOperator = toSqlOperator(name) relBuilder.call(sqlOperator, rexArgs) {code} CodeGenerator: {code} case OR => operands.reduceLeft { (left: GeneratedExpression, right: GeneratedExpression) => requireBoolean(left) requireBoolean(right) generateOr(nullCheck, left, right) } case call: SqlOperator => val callGen = ScalarFunctions.getCallGenerator(call, operands.map(_.resultType)) callGen .getOrElse(throw new CodeGenException(s"Unsupported call: $call")) .generate(this, operands) {code} In both snippets UnaryMinus and OR could be expressed in the case call block using appropriate CallGenerator. My question is should we replace whole match block with the invocation to {{getCallGenerator}} was (Author: dawidwys): I thought about a logic as described by you in {{getCallGenerator}}, but right know we can't specify signature with NUMERIC type cause {{NumericTypeInfo}} is an abstract class thus it cannot be instantiated. Regarding the two ways of processing I meant sth different. Have a look at those two code snippets: RexNodeTranslator: {code} case UnaryMinus(child) => val c = toRexNode(child, relBuilder) relBuilder.call(SqlStdOperatorTable.UNARY_MINUS, c) // Scalar functions case Call(name, args@_*) => val rexArgs = args.map(toRexNode(_, relBuilder)) val sqlOperator = toSqlOperator(name) relBuilder.call(sqlOperator, rexArgs) {code} CodeGenerator: {code} case OR => operands.reduceLeft { (left: GeneratedExpression, right: GeneratedExpression) => requireBoolean(left) requireBoolean(right) generateOr(nullCheck, left, right) } case call: SqlOperator => val callGen = ScalarFunctions.getCallGenerator(call, operands.map(_.resultType)) callGen .getOrElse(throw new CodeGenException(s"Unsupported call: $call")) .generate(this, operands) {code} In both snippets UnaryMinus and OR could be expressed in the case call block using appropriate CallGenerator. My question is should we replace whole match block with the invocation to {{getCallGenerator}} > Implement math functions for Table API > -- > > Key: FLINK-3574 > URL: https://issues.apache.org/jira/browse/FLINK-3574 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Dawid Wysakowicz > > {code} > MOD > EXP > POWER > LN > LOG10 > ABS > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3573) Implement more String functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-3573. - Resolution: Implemented Fix Version/s: 1.1.0 > Implement more String functions for Table API > - > > Key: FLINK-3573 > URL: https://issues.apache.org/jira/browse/FLINK-3573 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Timo Walther > Fix For: 1.1.0 > > > Implement the remaining string functions: > {code} > CHARACTER_LENGTH > CONCAT > UPPER > LOWER > INITCAP > LIKE > SIMILAR > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3574) Implement math functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184889#comment-15184889 ] Dawid Wysakowicz commented on FLINK-3574: - I thought about a logic as described by you in {{getCallGenerator}}, but right know we can't specify signature with NUMERIC type cause {{NumericTypeInfo}} is an abstract class thus it cannot be instantiated. Regarding the two ways of processing I meant sth different. Have a look at those two code snippets: RexNodeTranslator: {code} case UnaryMinus(child) => val c = toRexNode(child, relBuilder) relBuilder.call(SqlStdOperatorTable.UNARY_MINUS, c) // Scalar functions case Call(name, args@_*) => val rexArgs = args.map(toRexNode(_, relBuilder)) val sqlOperator = toSqlOperator(name) relBuilder.call(sqlOperator, rexArgs) {code} CodeGenerator: {code} case OR => operands.reduceLeft { (left: GeneratedExpression, right: GeneratedExpression) => requireBoolean(left) requireBoolean(right) generateOr(nullCheck, left, right) } case call: SqlOperator => val callGen = ScalarFunctions.getCallGenerator(call, operands.map(_.resultType)) callGen .getOrElse(throw new CodeGenException(s"Unsupported call: $call")) .generate(this, operands) {code} In both snippets UnaryMinus and OR could be expressed in the case call block using appropriate CallGenerator. My question is should we replace whole match block with the invocation to {{getCallGenerator}} > Implement math functions for Table API > -- > > Key: FLINK-3574 > URL: https://issues.apache.org/jira/browse/FLINK-3574 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Dawid Wysakowicz > > {code} > MOD > EXP > POWER > LN > LOG10 > ABS > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3573) Implement more String functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184884#comment-15184884 ] ASF GitHub Bot commented on FLINK-3573: --- Github user twalthr closed the pull request at: https://github.com/apache/flink/pull/1766 > Implement more String functions for Table API > - > > Key: FLINK-3573 > URL: https://issues.apache.org/jira/browse/FLINK-3573 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Timo Walther > > Implement the remaining string functions: > {code} > CHARACTER_LENGTH > CONCAT > UPPER > LOWER > INITCAP > LIKE > SIMILAR > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3573] [table] Implement more String fun...
Github user twalthr closed the pull request at: https://github.com/apache/flink/pull/1766 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3573) Implement more String functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184881#comment-15184881 ] ASF GitHub Bot commented on FLINK-3573: --- Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/1766#issuecomment-193777848 Merging this... > Implement more String functions for Table API > - > > Key: FLINK-3573 > URL: https://issues.apache.org/jira/browse/FLINK-3573 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Timo Walther > > Implement the remaining string functions: > {code} > CHARACTER_LENGTH > CONCAT > UPPER > LOWER > INITCAP > LIKE > SIMILAR > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3573] [table] Implement more String fun...
Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/1766#issuecomment-193777848 Merging this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3588) Add a streaming (exactly-once) JDBC connector
Chesnay Schepler created FLINK-3588: --- Summary: Add a streaming (exactly-once) JDBC connector Key: FLINK-3588 URL: https://issues.apache.org/jira/browse/FLINK-3588 Project: Flink Issue Type: Improvement Components: Streaming Connectors Affects Versions: 1.0.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3574) Implement math functions for Table API
[ https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184870#comment-15184870 ] Timo Walther commented on FLINK-3574: - Maybe we need some casting/fallback logic in {{getCallGenerator}} to find a suitable function: first check entire signature, then check signature with casting. {{addSqlFunctionMethod}} just determine which types the final Java method needs. The design of the `ScalarFunctions` class is not completely done yet as we also want to integrate custom user-defined functions later. RexNodeTranslator is the interface between the Table API and the Calcite stack. However, we will have a SQL API in the near future that won't go through the RexNodeTranslator but directly to the CodeGenerator. > Implement math functions for Table API > -- > > Key: FLINK-3574 > URL: https://issues.apache.org/jira/browse/FLINK-3574 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Timo Walther >Assignee: Dawid Wysakowicz > > {code} > MOD > EXP > POWER > LN > LOG10 > ABS > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3169) Drop {{Key}} type from Record Data Model
[ https://issues.apache.org/jira/browse/FLINK-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-3169. --- Resolution: Fixed Removed in fe117afd31e823006f680d145c8f75033816ed17 > Drop {{Key}} type from Record Data Model > > > Key: FLINK-3169 > URL: https://issues.apache.org/jira/browse/FLINK-3169 > Project: Flink > Issue Type: Improvement >Affects Versions: 0.10.1 >Reporter: Stephan Ewen >Assignee: Chesnay Schepler > Fix For: 1.0.0 > > > The {{Key}} type is currently still used by > - The RecordComparator: This can be moved to the test scope of > {{flink-runtime}} > - The range partitioner distributions. They are outdated anyways and not > used any more. > - All value types can be moved to {{NormalizableKey}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)