[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-08 Thread Timo Walther (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186694#comment-15186694
 ] 

Timo Walther commented on FLINK-3579:
-

Hey,
thanks for your interest. I think no one is working on this issue right now, so 
you can work on it if you like. Should I assign it to you?
Maybe this issue has the same problem as FLINK-3086.

> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-03-08 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186662#comment-15186662
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Hi Josep,
no problem! I'll look into your comments. I'm guessing most of them will be 
resolved with a small example. Just ping me if you need help!
-Vasia.

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2997) Support range partition with user customized data distribution.

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186355#comment-15186355
 ] 

ASF GitHub Bot commented on FLINK-2997:
---

GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1776

[FLINK-2997] Support range partition with user customized data distribution.

Sometime user have better knowledge of the source data, and they can build 
customized `data distribution` to do range partition more efficiently.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink flink-2997

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1776


commit 1e00767da3e83d688847ffe5c4ab77cab68f2062
Author: gallenvara 
Date:   2016-01-27T17:34:10Z

Enable range partition with custom data distribution.




> Support range partition with user customized data distribution.
> ---
>
> Key: FLINK-2997
> URL: https://issues.apache.org/jira/browse/FLINK-2997
> Project: Flink
>  Issue Type: New Feature
>Reporter: Chengxiang Li
>
> This is a followup work of FLINK-7, sometime user have better knowledge of 
> the source data, and they can build customized data distribution to do range 
> partition more efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2997] Support range partition with user...

2016-03-08 Thread gallenvara
GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1776

[FLINK-2997] Support range partition with user customized data distribution.

Sometime user have better knowledge of the source data, and they can build 
customized `data distribution` to do range partition more efficiently.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink flink-2997

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1776


commit 1e00767da3e83d688847ffe5c4ab77cab68f2062
Author: gallenvara 
Date:   2016-01-27T17:34:10Z

Enable range partition with custom data distribution.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3207) Add a Pregel iteration abstraction to Gelly

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185806#comment-15185806
 ] 

ASF GitHub Bot commented on FLINK-3207:
---

Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1575#issuecomment-193968765
  
Hi @vasia this is a nice PR, I commented some minor documentation / naming 
issues
+1 for merging


> Add a Pregel iteration abstraction to Gelly
> ---
>
> Key: FLINK-3207
> URL: https://issues.apache.org/jira/browse/FLINK-3207
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> This issue proposes to add a Pregel/Giraph-like iteration abstraction to 
> Gelly that will only expose one UDF to the user, {{compute()}}. {{compute()}} 
> will have access to both the vertex state and the incoming messages, and will 
> be able to produce messages and update the vertex value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3207] [gelly] adds the vertex-centric i...

2016-03-08 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1575#issuecomment-193968765
  
Hi @vasia this is a nice PR, I commented some minor documentation / naming 
issues
+1 for merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185789#comment-15185789
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user danielblazevski closed the pull request at:

https://github.com/apache/flink/pull/1220


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185790#comment-15185790
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

GitHub user danielblazevski reopened a pull request:

https://github.com/apache/flink/pull/1220

[FLINK-1745] Add exact k-nearest-neighbours algorithm to machine learning 
library

I added a quadtree data structure for the knn algorithm.  @chiwanpark made 
originally made a pull request for a kNN algorithm, and we coordinated so that 
I incorporate a tree structure. The quadtree scales very well with the number 
of training + test points, but scales poorly with the dimension (even the 
R-tree scales poorly with the dimension). I added a flag that is automatically 
determines whether or not to use the quadtree. My implementation needed to use 
the Euclidean or SquaredEuclidean distance since I needed a specific notion of 
the distance between a test point and a box in the quadtree. I added another 
test KNNQuadTreeSuite in addition to Chiwan Park's KNNITSuite, since C. Park's 
parameters will automatically choose the brute-force non-quadtree method.

For more details on the quadtree + how I used it for the KNN query, please 
see another branch I created that has a README.md:

https://github.com/danielblazevski/flink/tree/FLINK-1745-devel/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielblazevski/flink FLINK-1745

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1220.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1220


commit c7e5056c6d273f6f0f841f77e0fdd91ca221602d
Author: Chiwan Park 
Date:   2015-06-30T08:41:25Z

[FLINK-1745] [ml] Add exact k-nearest-neighbor join

commit 9d0c7942c09086324fadb29bdce749683a0d1a7e
Author: danielblazevski 
Date:   2015-09-15T21:49:05Z

modified kNN test to familiarize with Flink and KNN.scala

commit 611248e57166dc549f86f805b590dd4e45cb3df5
Author: danielblazevski 
Date:   2015-09-15T21:49:17Z

modified kNN test to familiarize with Flink and KNN.scala

commit 1fd8231ce194b52b5a1bd55bbc5e135b3fa5775b
Author: danielblazevski 
Date:   2015-09-16T01:26:57Z

nightly commit, minor changes:  got the filter to work, working on mapping 
the training set to include box lables

commit 15d7d2cb308b23e24c43d103b85a76b0e665cbd3
Author: danielblazevski 
Date:   2015-09-22T02:02:51Z

commit before incporporating quadtree

commit 8f2da8a66516565c59df8828de2715b45397cb7f
Author: danielblazevski 
Date:   2015-09-22T15:49:25Z

did a basic import of QuadTree and Test; to-do:  modify QuadTree to allow 
KNN.scala to make use of

commit e1cef2c5aea65c6f204caeff6348e2778231f98d
Author: danielblazevski 
Date:   2015-09-22T21:03:04Z

transfered ListBuffers for objects in leaf nodes to Vectors

commit c3387ef2ef59734727b56ea652fdb29af957d20b
Author: danielblazevski 
Date:   2015-09-23T00:41:29Z

basic test on 2D unit box seems to work -- need to generalize, e.g. to 
include automated bounding box

commit 48294ff37a5f800e5111280da5a3c03f4375028d
Author: danielblazevski 
Date:   2015-09-23T15:03:06Z

had to debug quadtree -- back to testing 2D

commit 6403ba14e240ed8d67a296ac789e7e00dece800d
Author: danielblazevski 
Date:   2015-09-23T15:22:46Z

Testing 2D looks good, strong improvement in run time compared to 
brute-force method

commit 426466a40bc2625f390fe0d912f56a346e46c8f8
Author: danielblazevski 
Date:   2015-09-23T19:04:52Z

added automated detection of bounding box based on min/max values of both 
training and test sets

commit c35543b828384aa4ce04d56dfcb3d73db46d1e6d
Author: danielblazevski 
Date:   2015-09-24T00:28:56Z

added automated radius about test point to define localized neighborhood, 
result runs.  TO-DO:  Lots of tests

commit 8e2d2e78f8533d4192aebe9b4baa7efbfa5928a5
Author: danielblazevski 
Date:   2015-09-24T00:54:06Z

Note for future:  previous commit passed test of Chiwan Park had in intial 
knn implementation

commit d6fd40cb88d6e198e52c368e829bf7d32d432081
Author: danielblazevski 
Date:   2015-09-24T01:56:38Z

Note for future:  previous commit passed 3D version of the test that Chiwan 
Park had in the intial knn implementation

commit 0ec1f4866157ca073341672e7fe9a50871ac0b7c
Author: 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2016-03-08 Thread danielblazevski
GitHub user danielblazevski reopened a pull request:

https://github.com/apache/flink/pull/1220

[FLINK-1745] Add exact k-nearest-neighbours algorithm to machine learning 
library

I added a quadtree data structure for the knn algorithm.  @chiwanpark made 
originally made a pull request for a kNN algorithm, and we coordinated so that 
I incorporate a tree structure. The quadtree scales very well with the number 
of training + test points, but scales poorly with the dimension (even the 
R-tree scales poorly with the dimension). I added a flag that is automatically 
determines whether or not to use the quadtree. My implementation needed to use 
the Euclidean or SquaredEuclidean distance since I needed a specific notion of 
the distance between a test point and a box in the quadtree. I added another 
test KNNQuadTreeSuite in addition to Chiwan Park's KNNITSuite, since C. Park's 
parameters will automatically choose the brute-force non-quadtree method.

For more details on the quadtree + how I used it for the KNN query, please 
see another branch I created that has a README.md:

https://github.com/danielblazevski/flink/tree/FLINK-1745-devel/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielblazevski/flink FLINK-1745

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1220.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1220


commit c7e5056c6d273f6f0f841f77e0fdd91ca221602d
Author: Chiwan Park 
Date:   2015-06-30T08:41:25Z

[FLINK-1745] [ml] Add exact k-nearest-neighbor join

commit 9d0c7942c09086324fadb29bdce749683a0d1a7e
Author: danielblazevski 
Date:   2015-09-15T21:49:05Z

modified kNN test to familiarize with Flink and KNN.scala

commit 611248e57166dc549f86f805b590dd4e45cb3df5
Author: danielblazevski 
Date:   2015-09-15T21:49:17Z

modified kNN test to familiarize with Flink and KNN.scala

commit 1fd8231ce194b52b5a1bd55bbc5e135b3fa5775b
Author: danielblazevski 
Date:   2015-09-16T01:26:57Z

nightly commit, minor changes:  got the filter to work, working on mapping 
the training set to include box lables

commit 15d7d2cb308b23e24c43d103b85a76b0e665cbd3
Author: danielblazevski 
Date:   2015-09-22T02:02:51Z

commit before incporporating quadtree

commit 8f2da8a66516565c59df8828de2715b45397cb7f
Author: danielblazevski 
Date:   2015-09-22T15:49:25Z

did a basic import of QuadTree and Test; to-do:  modify QuadTree to allow 
KNN.scala to make use of

commit e1cef2c5aea65c6f204caeff6348e2778231f98d
Author: danielblazevski 
Date:   2015-09-22T21:03:04Z

transfered ListBuffers for objects in leaf nodes to Vectors

commit c3387ef2ef59734727b56ea652fdb29af957d20b
Author: danielblazevski 
Date:   2015-09-23T00:41:29Z

basic test on 2D unit box seems to work -- need to generalize, e.g. to 
include automated bounding box

commit 48294ff37a5f800e5111280da5a3c03f4375028d
Author: danielblazevski 
Date:   2015-09-23T15:03:06Z

had to debug quadtree -- back to testing 2D

commit 6403ba14e240ed8d67a296ac789e7e00dece800d
Author: danielblazevski 
Date:   2015-09-23T15:22:46Z

Testing 2D looks good, strong improvement in run time compared to 
brute-force method

commit 426466a40bc2625f390fe0d912f56a346e46c8f8
Author: danielblazevski 
Date:   2015-09-23T19:04:52Z

added automated detection of bounding box based on min/max values of both 
training and test sets

commit c35543b828384aa4ce04d56dfcb3d73db46d1e6d
Author: danielblazevski 
Date:   2015-09-24T00:28:56Z

added automated radius about test point to define localized neighborhood, 
result runs.  TO-DO:  Lots of tests

commit 8e2d2e78f8533d4192aebe9b4baa7efbfa5928a5
Author: danielblazevski 
Date:   2015-09-24T00:54:06Z

Note for future:  previous commit passed test of Chiwan Park had in intial 
knn implementation

commit d6fd40cb88d6e198e52c368e829bf7d32d432081
Author: danielblazevski 
Date:   2015-09-24T01:56:38Z

Note for future:  previous commit passed 3D version of the test that Chiwan 
Park had in the intial knn implementation

commit 0ec1f4866157ca073341672e7fe9a50871ac0b7c
Author: danielblazevski 
Date:   2015-09-24T14:27:20Z

changed filename of QuadTreeTest to QuadTreeSuite, about to make test more 
comprehensive and similar format to other Flink tests

commit 

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2016-03-08 Thread danielblazevski
Github user danielblazevski closed the pull request at:

https://github.com/apache/flink/pull/1220


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185785#comment-15185785
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user danielblazevski commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-193964375
  
Hi @chiwanpark, I modified the tests and corrected the package + import 
statements, please have a look.  

I will add more details soon



> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2016-03-08 Thread danielblazevski
Github user danielblazevski commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-193964375
  
Hi @chiwanpark, I modified the tests and corrected the package + import 
statements, please have a look.  

I will add more details soon



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3574) Implement math functions for Table API

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185644#comment-15185644
 ] 

ASF GitHub Bot commented on FLINK-3574:
---

GitHub user dawidwys opened a pull request:

https://github.com/apache/flink/pull/1775

[FLINK-3574] Implement math functions for Table API



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dawidwys/flink mathFunctions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1775


commit 1ff3a297ae2bb2d0733b3fca68bd91c54301f556
Author: dawid 
Date:   2016-03-08T19:47:42Z

[FLINK-3574] Implement math functions for Table API




> Implement math functions for Table API
> --
>
> Key: FLINK-3574
> URL: https://issues.apache.org/jira/browse/FLINK-3574
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Dawid Wysakowicz
>
> {code}
> MOD
> EXP
> POWER
> LN
> LOG10
> ABS
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3574] Implement math functions for Tabl...

2016-03-08 Thread dawidwys
GitHub user dawidwys opened a pull request:

https://github.com/apache/flink/pull/1775

[FLINK-3574] Implement math functions for Table API



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dawidwys/flink mathFunctions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1775


commit 1ff3a297ae2bb2d0733b3fca68bd91c54301f556
Author: dawid 
Date:   2016-03-08T19:47:42Z

[FLINK-3574] Implement math functions for Table API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-3593) DistinctITCase is failing

2016-03-08 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske reassigned FLINK-3593:


Assignee: Fabian Hueske

> DistinctITCase is failing
> -
>
> Key: FLINK-3593
> URL: https://issues.apache.org/jira/browse/FLINK-3593
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Failed tests: 
>   DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3593) DistinctITCase is failing

2016-03-08 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-3593.
--
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed on tableOnCalcite branch

> DistinctITCase is failing
> -
>
> Key: FLINK-3593
> URL: https://issues.apache.org/jira/browse/FLINK-3593
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Failed tests: 
>   DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3593) DistinctITCase is failing

2016-03-08 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3593:
-
Description: 
Failed tests: 
  DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
  DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
  DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
  DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
  DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]

  was:
The {{DistinctITCase}} test is failing on the {{tableOnCalcite}} branch:
{code}
Failed tests: 
  DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
  DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
  DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
  DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
  DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
{code}


> DistinctITCase is failing
> -
>
> Key: FLINK-3593
> URL: https://issues.apache.org/jira/browse/FLINK-3593
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Vasia Kalavri
>
> Failed tests: 
>   DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> [rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
>   DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
> applying rule DataSetAggregateRule, args 
> [rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
>   DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
> DataSetAggregateRule, args 
> 

[jira] [Created] (FLINK-3593) DistinctITCase is failing

2016-03-08 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-3593:


 Summary: DistinctITCase is failing
 Key: FLINK-3593
 URL: https://issues.apache.org/jira/browse/FLINK-3593
 Project: Flink
  Issue Type: Bug
  Components: Table API
Reporter: Vasia Kalavri


The {{DistinctITCase}} test is failing on the {{tableOnCalcite}} branch:
{code}
Failed tests: 
  DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:70 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
  DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
  DistinctITCase.testDistinct:53 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#28:FlinkAggregate.FLINK.[](input=rel#27:Subset#5.FLINK.[],group={1})]
  DistinctITCase.testDistinctAfterAggregate:56 Internal error: Error while 
applying rule DataSetAggregateRule, args 
[rel#86:FlinkAggregate.FLINK.[](input=rel#85:Subset#17.FLINK.[],group={1})]
  DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#53:FlinkAggregate.FLINK.[](input=rel#52:Subset#10.FLINK.[],group={1})]
  DistinctITCase.testDistinct:44 Internal error: Error while applying rule 
DataSetAggregateRule, args 
[rel#111:FlinkAggregate.FLINK.[](input=rel#110:Subset#22.FLINK.[],group={1})]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3592) Update setup quickstart

2016-03-08 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3592.
--
Resolution: Fixed

Implemented in bb9c1fa (release-1.0) and be1cf9d (master).

> Update setup quickstart
> ---
>
> Key: FLINK-3592
> URL: https://issues.apache.org/jira/browse/FLINK-3592
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> The setup quickstart is outdated and shows a batch example instead of a 
> streaming one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3592) Update setup quickstart

2016-03-08 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-3592:
--

 Summary: Update setup quickstart
 Key: FLINK-3592
 URL: https://issues.apache.org/jira/browse/FLINK-3592
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


The setup quickstart is outdated and shows a batch example instead of a 
streaming one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-3591.
-
   Resolution: Fixed
Fix Version/s: 1.0.1
   1.1.0

Changed in 
https://github.com/apache/flink/commit/35ad1972d1604b4f69a9bfb12f00c280ad6262f8

> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.1.0, 1.0.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185154#comment-15185154
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193844948
  
Alright, I merged it with the fixes.


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185155#comment-15185155
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/1774


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193844948
  
Alright, I merged it with the fixes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/1774


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185140#comment-15185140
 ] 

ramkrishna.s.vasudevan commented on FLINK-3579:
---

Is it possible for me to work on this JIRA, if you are going to take it 
immediately [~twalthr]? And also if this JIRA is not so important for your 
Table work to progress?

> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185107#comment-15185107
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193828321
  
Very very very nice! Thanks for doing this! I like it a lot.

I was wondering whether we want to have a Scala version as well. What do 
you think? We can do it as a follow up if you don't have time for it now.




> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185112#comment-15185112
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193830427
  
Thanks @uce as well! :smiley: 

I incorporated all the fixes. The only thing I'm not happy with is the 
section with the window. I feel that I can't do it without some small 
explanation but it also can't be to big/in-depth.

Scala I would do as a follow-up.


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193830427
  
Thanks @uce as well! :smiley: 

I incorporated all the fixes. The only thing I'm not happy with is the 
section with the window. I feel that I can't do it without some small 
explanation but it also can't be to big/in-depth.

Scala I would do as a follow-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2444) Add tests for HadoopInputFormats

2016-03-08 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-2444:

Assignee: Martin Liesenberg

> Add tests for HadoopInputFormats
> 
>
> Key: FLINK-2444
> URL: https://issues.apache.org/jira/browse/FLINK-2444
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Fabian Hueske
>Assignee: Martin Liesenberg
>  Labels: starter
>
> The HadoopInputFormats and HadoopInputFormatBase classes are not sufficiently 
> covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop InputFormats 
> are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193828321
  
Very very very nice! Thanks for doing this! I like it a lot.

I was wondering whether we want to have a Scala version as well. What do 
you think? We can do it as a follow up if you don't have time for it now.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376606
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185103#comment-15185103
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376606
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185080#comment-15185080
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375032
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
--- End diff --

go from to?


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375307
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
--- End diff --

Wikipedia capitalization


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185099#comment-15185099
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376436
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376436
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376365
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185097#comment-15185097
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376360
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185098#comment-15185098
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376365
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55376360
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375032
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
--- End diff --

go from to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2445) Add tests for HadoopOutputFormats

2016-03-08 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-2445:

Assignee: Martin Liesenberg

> Add tests for HadoopOutputFormats
> -
>
> Key: FLINK-2445
> URL: https://issues.apache.org/jira/browse/FLINK-2445
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Fabian Hueske
>Assignee: Martin Liesenberg
>  Labels: starter
>
> The HadoopOutputFormats and HadoopOutputFormatBase classes are not 
> sufficiently covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop 
> OutputFormats are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185087#comment-15185087
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375679
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[jira] [Closed] (FLINK-2315) Hadoop Writables cannot exploit implementing NormalizableKey

2016-03-08 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-2315.
---
Resolution: Invalid

> Hadoop Writables cannot exploit implementing NormalizableKey
> 
>
> Key: FLINK-2315
> URL: https://issues.apache.org/jira/browse/FLINK-2315
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API
>Affects Versions: 0.9
>Reporter: Stephan Ewen
> Fix For: 1.0.0
>
>
> When one implements a type that extends {{hadoop.io.Writable}} and it 
> implements {{NormalizableKey}}, this is still not exploited.
> The Writable comparator fails to recognize that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185090#comment-15185090
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375868
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375868
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185084#comment-15185084
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375307
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
--- End diff --

Wikipedia capitalization


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55375144
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
--- End diff --

Maven capitalization


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185070#comment-15185070
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55374388
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55374388
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185065#comment-15185065
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193822714
  
Thanks for the thorough review @vasia, that was quick :smile: 


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185064#comment-15185064
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55373645
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193822714
  
Thanks for the thorough review @vasia, that was quick :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55373645
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185060#comment-15185060
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193821304
  
Just a few misspellings. Otherwise looks great!


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1774#issuecomment-193821304
  
Just a few misspellings. Otherwise looks great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Improve TaskManagerTest#testRemotePartitionNot...

2016-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1658


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185055#comment-15185055
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55372164
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55372164
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185051#comment-15185051
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371997
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371996
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185045#comment-15185045
 ] 

Josep Rubió commented on FLINK-1707:


Hi Vasia,

First sorry for my delay, I had no much time last weeks. I tried to give a 
response to your questions, I don't know if I gave you enough info. I'll try to 
work in it this week/weekend. 

I have some doubts about how to store the vertex state, I'll write a section 
explaining what I did, but basically is using flink tuples.

Thanks!

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371997
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185050#comment-15185050
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371996
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: Improve TaskManagerTest#testRemotePartitionNot...

2016-03-08 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1658#issuecomment-193817912
  
merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185042#comment-15185042
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371459
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371429
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371459
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185039#comment-15185039
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371362
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread aljoscha
GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/1774

[FLINK-3591] Replace Quickstart K-Means Example by Streaming Example

I attached a rough pdf rendering of the changed page.

[Apache Flink 1.1-SNAPSHOT Documentation: Quick Start: Monitoring the 
Wikipedia Edit 
Stream.pdf](https://github.com/apache/flink/files/163347/Apache.Flink.1.1-SNAPSHOT.Documentation.Quick.Start.Monitoring.the.Wikipedia.Edit.Stream.pdf)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink doc-fix-quickstart-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1774


commit ce53fa70512d7817aed64ec30bd056567c7c4f55
Author: Aljoscha Krettek 
Date:   2016-03-08T14:49:09Z

[FLINK-3591] Replace Quickstart K-Means Example by Streaming Example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371362
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185041#comment-15185041
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371429
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[jira] [Commented] (FLINK-3444) env.fromElements relies on the first input element for determining the DataSet/DataStream type

2016-03-08 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185030#comment-15185030
 ] 

Chesnay Schepler commented on FLINK-3444:
-

We could add a fromElements method that takes a TypeInformation or Class 
argument in addition to the values. Should be the easiest and safest way to 
solve this issue.

> env.fromElements relies on the first input element for determining the 
> DataSet/DataStream type
> --
>
> Key: FLINK-3444
> URL: https://issues.apache.org/jira/browse/FLINK-3444
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, DataStream API
>Affects Versions: 0.10.0, 1.0.0
>Reporter: Vasia Kalavri
>
> The {{fromElements}} method of the {{ExecutionEnvironment}} and 
> {{StreamExecutionEnvironment}} determines the DataSet/DataStream type by 
> extracting the type of the first input element.
> This is problematic if the first element is a subtype of another element in 
> the collection.
> For example, the following
> {code}
> DataStream input = env.fromElements(new Event(1, "a"), new SubEvent(2, 
> "b"));
> {code}
> succeeds, while the following
> {code}
> DataStream input = env.fromElements(new SubEvent(1, "a"), new Event(2, 
> "b"));
> {code}
> fails with "java.lang.IllegalArgumentException: The elements in the 
> collection are not all subclasses of SubEvent".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3519) Subclasses of Tuples don't work if the declared type of a DataSet is not the descendant

2016-03-08 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-3519:

Assignee: Gabor Gevay

> Subclasses of Tuples don't work if the declared type of a DataSet is not the 
> descendant
> ---
>
> Key: FLINK-3519
> URL: https://issues.apache.org/jira/browse/FLINK-3519
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Minor
>
> If I have a subclass of TupleN, then objects of this type will turn into 
> TupleNs when I try to use them in a DataSet.
> For example, if I have a class like this:
> {code}
> public static class Foo extends Tuple1 {
>   public short a;
>   public Foo() {}
>   public Foo(int f0, int a) {
>   this.f0 = f0;
>   this.a = (short)a;
>   }
>   @Override
>   public String toString() {
>   return "(" + f0 + ", " + a + ")";
>   }
> }
> {code}
> And then I do this:
> {code}
> env.fromElements(0,0,0).map(new MapFunction() {
>   @Override
>   public Tuple1 map(Integer value) throws Exception {
>   return new Foo(5, 6);
>   }
> }).print();
> {code}
> Then I don't have Foos in the output, but only Tuples:
> {code}
> (5)
> (5)
> (5)
> {code}
> The problem is caused by the TupleSerializer not caring about subclasses at 
> all. I guess the reason for this is performance: we don't want to deal with 
> writing and reading subclass tags when we have Tuples.
> I see three options for solving this:
> 1. Add subclass tags to the TupleSerializer: This is not really an option, 
> because we don't want to loose performance.
> 2. Document this behavior in the javadoc of the Tuple classes.
> 3. Make the Tuple types final: this would be the clean solution, but it is 
> API breaking, and the first victim would be Gelly: the Vertex and Edge types 
> extend from tuples. (Note that the issue doesn't appear there, because the 
> DataSets there always have the type of the descendant class.)
> When deciding between 2. and 3., an important point to note is that if you 
> have your class extend from a Tuple type instead of just adding the f0, f1, 
> ... fields manually in the hopes of getting the performance boost associated 
> with Tuples, then you are out of luck: the PojoSerializer will kick in anyway 
> when the declared types of your DataSets are the descendant type.
> If someone knows about a good reason to extend from a Tuple class, then 
> please comment.
> For 2., this is a suggested wording for the javadoc of the Tuple classes:
> Warning: Please don't subclass Tuple classes, but if you do, then be sure to 
> always declare the element type of your DataSets to your descendant type. 
> (That is, if you have a "class A extends Tuple2", then don't use instances of 
> A in a DataSet, but use DataSet.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185032#comment-15185032
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371080
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55371080
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185026#comment-15185026
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55370936
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% 

[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55370936
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
+
+{% highlight java %}
+package wikiedits;
+
+public class WikipediaAnalysis {
+
+public static void main(String[] args) throws Exception {
+
+}
+}
+{% endhighlight %}
+
+I admit it's very bare bones now but we will fill it as we go. Note, that 
I'll not give
+import statements here since IDEs can add them automatically. At the end 
of this section I'll show
+the complete code with import statements if you simply want to skip ahead 
and enter that in your
+editor.
+
+The first step in a Flink program is to create a 
`StreamExecutionEnvironment`
+(or `ExecutionEnvironment` if you are writing a batch job). This can be 
used to set execution
+parameters and create sources for reading from external systems. So let's 
go ahead, add
+this to the main method:
+
+{% highlight java %}
+StreamExecutionEnvironment see = 
StreamExecutionEnvironment.getExecutionEnvironment();
+{% endhighlight %}
+
+Next, we will create a source that reads from the Wikipedia IRC log:
+
+{% highlight java %}
+DataStream edits = 

[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185018#comment-15185018
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55370299
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
--- End diff --

src/main/java/...?


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55370299
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
+a streaming analysis program on a Flink cluster.
+
+Wikipedia provides an IRC channel where all edits to the wiki are logged. 
We are going to
+read this channel in Flink and count the number of bytes that each user 
edits within
+a given window of time. This is easy enough to implement in a few minutes 
using Flink but it will
+give you a good foundation from which to start building more complex 
analysis programs on your own.
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project 
stucture. Please
+see [Java API Quickstart]({{ site.baseurl 
}}/quickstart/java_api_quickstart.html) for more details
+about this. For our purposes, the command to run is this:
+
+{% highlight bash %}
+$ mvn archetype:generate\
+-DarchetypeGroupId=org.apache.flink\
+-DarchetypeArtifactId=flink-quickstart-java\
+-DarchetypeVersion=1.0.0\
+-DgroupId=wiki-edits\
+-DartifactId=wiki-edits\
+-Dversion=0.1\
+-Dpackage=wikiedits\
+-DinteractiveMode=false\
+{% endhighlight %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With 
the above parameters,
+maven will create a project structure that looks like this:
+
+{% highlight bash %}
+$ tree wiki-edits
+wiki-edits/
+├── pom.xml
+└── src
+└── main
+├── java
+│   └── wikiedits
+│   ├── Job.java
+│   ├── SocketTextStreamWordCount.java
+│   └── WordCount.java
+└── resources
+└── log4j.properties
+{% endhighlight %}
+
+There is our `pom.xml` file that already has the Flink dependencies added 
in the root directory and
+several example Flink programs in `src/main/java`. We can delete the 
example programs, since
+we are going to start from scratch:
+
+{% highlight bash %}
+$ rm wiki-edits/src/main/java/wikiedits/*.java
+{% endhighlight %}
+
+As a last step we need to add the Flink wikipedia connector as a 
dependency so that we can
+use it in our program. Edit the `dependencies` section so that it looks 
like this:
+
+{% highlight xml %}
+
+
+org.apache.flink
+flink-java
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-clients_2.10
+${flink.version}
+
+
+org.apache.flink
+flink-connector-wikiedits_2.10
+${flink.version}
+
+
+{% endhighlight %}
+
+Notice the `flink-connector-wikiedits_2.10` dependency that was added.
+
+## Writing a Flink Program
+
+It's coding time. Fire up your favorite IDE and import the Maven project 
or open a text editor and
+create the file `src/main/wikiedits/WikipediaAnalysis.java`:
--- End diff --

src/main/java/...?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3591] Replace Quickstart K-Means Exampl...

2016-03-08 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55369437
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
--- End diff --

fo => go


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185009#comment-15185009
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1774#discussion_r55369437
  
--- Diff: docs/quickstart/run_example_quickstart.md ---
@@ -27,116 +27,360 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-This guide walks you through the steps of executing an example program 
([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on 
Flink. 
-On the way, you will see the a visualization of the program, the optimized 
execution plan, and track the progress of its execution.
+In this guide we will start from scratch and fo from setting up a Flink 
project and running
--- End diff --

fo => go


> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185000#comment-15185000
 ] 

ASF GitHub Bot commented on FLINK-3591:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/1774

[FLINK-3591] Replace Quickstart K-Means Example by Streaming Example

I attached a rough pdf rendering of the changed page.

[Apache Flink 1.1-SNAPSHOT Documentation: Quick Start: Monitoring the 
Wikipedia Edit 
Stream.pdf](https://github.com/apache/flink/files/163347/Apache.Flink.1.1-SNAPSHOT.Documentation.Quick.Start.Monitoring.the.Wikipedia.Edit.Stream.pdf)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink doc-fix-quickstart-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1774


commit ce53fa70512d7817aed64ec30bd056567c7c4f55
Author: Aljoscha Krettek 
Date:   2016-03-08T14:49:09Z

[FLINK-3591] Replace Quickstart K-Means Example by Streaming Example




> Replace Quickstart K-Means Example by Streaming Example
> ---
>
> Key: FLINK-3591
> URL: https://issues.apache.org/jira/browse/FLINK-3591
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3591) Replace Quickstart K-Means Example by Streaming Example

2016-03-08 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3591:
---

 Summary: Replace Quickstart K-Means Example by Streaming Example
 Key: FLINK-3591
 URL: https://issues.apache.org/jira/browse/FLINK-3591
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3590) JDBC Format tests don't hide derby logs

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184936#comment-15184936
 ] 

ASF GitHub Bot commented on FLINK-3590:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1773

[FLINK-3590] Hide derby log in JDBC tests

Hides derby logs by redirecting the log stream to a dummy stream

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 3590_jdbc_derby

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1773.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1773


commit c998b29391c08ddd359fdaf0574c063e48773678
Author: zentol 
Date:   2016-03-08T14:02:14Z

[FLINK-3590] Hide derby log in JDBC tests




> JDBC Format tests don't hide derby logs
> ---
>
> Key: FLINK-3590
> URL: https://issues.apache.org/jira/browse/FLINK-3590
> Project: Flink
>  Issue Type: Improvement
>  Components: Batch, Tests
>Affects Versions: 1.0.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The batch jdbc IO-format tests print the derby log, due to some refactoring 
> going wrong.
> {code}
>  T E S T S
> ---
> Running org.apache.flink.api.java.io.jdbc.JDBCInputFormatTest
> Tue Mar 08 12:17:03 UTC 2016 Thread[main,5,main] 
> java.lang.ClassNotFoundException: 
> org.apache.flink.api.java.record.io.jdbc.DevNullLogStream
> 
> Tue Mar 08 12:17:03 UTC 2016:
> Booting Derby version The Apache Software Foundation - Apache Derby - 
> 10.10.1.1 - (1458268): instance a816c00e-0153-5628-ba57-0d988ed0 
> on database directory 
> memory:/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc/ebookshop
>  with class loader sun.misc.Launcher$AppClassLoader@788bf135 
> Loaded from 
> file:/home/travis/.m2/repository/org/apache/derby/derby/10.10.1.1/derby-10.10.1.1.jar
> java.vendor=Oracle Corporation
> java.runtime.version=1.7.0_76-b13
> user.dir=/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc
> os.name=Linux
> os.arch=amd64
> os.version=3.13.0-40-generic
> derby.system.home=null
> derby.stream.error.field=org.apache.flink.api.java.record.io.jdbc.DevNullLogStream.DEV_NULL
> Database Class Loader started - derby.database.classpath=''
> {code}
> We should hide these logs again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3590] Hide derby log in JDBC tests

2016-03-08 Thread zentol
GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1773

[FLINK-3590] Hide derby log in JDBC tests

Hides derby logs by redirecting the log stream to a dummy stream

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 3590_jdbc_derby

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1773.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1773


commit c998b29391c08ddd359fdaf0574c063e48773678
Author: zentol 
Date:   2016-03-08T14:02:14Z

[FLINK-3590] Hide derby log in JDBC tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3590) JDBC Format tests don't hide derby logs

2016-03-08 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3590:
---

 Summary: JDBC Format tests don't hide derby logs
 Key: FLINK-3590
 URL: https://issues.apache.org/jira/browse/FLINK-3590
 Project: Flink
  Issue Type: Improvement
  Components: Batch, Tests
Affects Versions: 1.0.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor
 Fix For: 1.1.0


The batch jdbc IO-format tests print the derby log, due to some refactoring 
going wrong.

{code}
 T E S T S
---
Running org.apache.flink.api.java.io.jdbc.JDBCInputFormatTest
Tue Mar 08 12:17:03 UTC 2016 Thread[main,5,main] 
java.lang.ClassNotFoundException: 
org.apache.flink.api.java.record.io.jdbc.DevNullLogStream

Tue Mar 08 12:17:03 UTC 2016:
Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 
- (1458268): instance a816c00e-0153-5628-ba57-0d988ed0 
on database directory 
memory:/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc/ebookshop
 with class loader sun.misc.Launcher$AppClassLoader@788bf135 
Loaded from 
file:/home/travis/.m2/repository/org/apache/derby/derby/10.10.1.1/derby-10.10.1.1.jar
java.vendor=Oracle Corporation
java.runtime.version=1.7.0_76-b13
user.dir=/home/travis/build/zentol/flink/flink-batch-connectors/flink-jdbc
os.name=Linux
os.arch=amd64
os.version=3.13.0-40-generic
derby.system.home=null
derby.stream.error.field=org.apache.flink.api.java.record.io.jdbc.DevNullLogStream.DEV_NULL
Database Class Loader started - derby.database.classpath=''
{code}

We should hide these logs again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3574) Implement math functions for Table API

2016-03-08 Thread Dawid Wysakowicz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184935#comment-15184935
 ] 

Dawid Wysakowicz commented on FLINK-3574:
-

Ok, I understand now. I am gonna implement it that way later today.

> Implement math functions for Table API
> --
>
> Key: FLINK-3574
> URL: https://issues.apache.org/jira/browse/FLINK-3574
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Dawid Wysakowicz
>
> {code}
> MOD
> EXP
> POWER
> LN
> LOG10
> ABS
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3589) Allow setting Operator parallelism to default value

2016-03-08 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3589:
-

 Summary: Allow setting Operator parallelism to default value
 Key: FLINK-3589
 URL: https://issues.apache.org/jira/browse/FLINK-3589
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


User's can override the parallelism for a single operator by calling 
{{Operator.setParallelism}}, which accepts a positive value. {{Operator}} uses 
{{-1}} to indicate default parallelism. It would be nice to name and accept 
this default value.

This would enable user algorithms to allow configurable parallelism while still 
chaining operator methods.

For example, currently:

{code}
private int parallelism;
...
public void setParallelism(int parallelism) {
this.parallelism = parallelism;
}
...
MapOperator, Edge> newEdges = 
edges
.map(new MyMapFunction())
.name("My map function");

if (parallelism > 0) {
newEdges.setParallelism(parallelism);
}
{code}

Could be simplified to:

{code}
private int parallelism = Operator.DEFAULT_PARALLELISM;
...
public void setParallelism(int parallelism) {
this.parallelism = parallelism;
}
...
DataSet> newEdges = edges
.map(new MyMapFunction())
.setParallelism(parallelism)
.name("My map function");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-3574) Implement math functions for Table API

2016-03-08 Thread Dawid Wysakowicz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184889#comment-15184889
 ] 

Dawid Wysakowicz edited comment on FLINK-3574 at 3/8/16 1:11 PM:
-

I thought about a logic as described by you in {{getCallGenerator}}, but right 
know we can't specify signature with NUMERIC type cause {{NumericTypeInfo}} is 
an abstract class thus it cannot be instantiated. To give an example I would 
like to write sth like, which unfortunately is impossible:

{code}
  addSqlFunctionMethod(
LOG10,
Seq(NUMERIC_TYPE_INFO),
DOUBLE_TYPE_INFO,
BuiltInMethod.Log10.method)
{code}

Regarding the two ways of processing I meant sth different. Have a look at 
those two code snippets:  

RexNodeTranslator:
{code}
  case UnaryMinus(child) =>
val c = toRexNode(child, relBuilder)
relBuilder.call(SqlStdOperatorTable.UNARY_MINUS, c)

  // Scalar functions
  case Call(name, args@_*) =>
val rexArgs = args.map(toRexNode(_, relBuilder))
val sqlOperator = toSqlOperator(name)
relBuilder.call(sqlOperator, rexArgs)
{code}

CodeGenerator:
{code}
  case OR =>
operands.reduceLeft { (left: GeneratedExpression, right: 
GeneratedExpression) =>
  requireBoolean(left)
  requireBoolean(right)
  generateOr(nullCheck, left, right)
}

  case call: SqlOperator =>
val callGen = ScalarFunctions.getCallGenerator(call, 
operands.map(_.resultType))
callGen
  .getOrElse(throw new CodeGenException(s"Unsupported call: $call"))
  .generate(this, operands)
{code}

In both snippets UnaryMinus and OR could be expressed in the case call block 
using appropriate CallGenerator. My question is should we replace whole match 
block with the invocation to {{getCallGenerator}}


was (Author: dawidwys):
I thought about a logic as described by you in {{getCallGenerator}}, but right 
know we can't specify signature with NUMERIC type cause {{NumericTypeInfo}} is 
an abstract class thus it cannot be instantiated.

Regarding the two ways of processing I meant sth different. Have a look at 
those two code snippets:  

RexNodeTranslator:
{code}
  case UnaryMinus(child) =>
val c = toRexNode(child, relBuilder)
relBuilder.call(SqlStdOperatorTable.UNARY_MINUS, c)

  // Scalar functions
  case Call(name, args@_*) =>
val rexArgs = args.map(toRexNode(_, relBuilder))
val sqlOperator = toSqlOperator(name)
relBuilder.call(sqlOperator, rexArgs)
{code}

CodeGenerator:
{code}
  case OR =>
operands.reduceLeft { (left: GeneratedExpression, right: 
GeneratedExpression) =>
  requireBoolean(left)
  requireBoolean(right)
  generateOr(nullCheck, left, right)
}

  case call: SqlOperator =>
val callGen = ScalarFunctions.getCallGenerator(call, 
operands.map(_.resultType))
callGen
  .getOrElse(throw new CodeGenException(s"Unsupported call: $call"))
  .generate(this, operands)
{code}

In both snippets UnaryMinus and OR could be expressed in the case call block 
using appropriate CallGenerator. My question is should we replace whole match 
block with the invocation to {{getCallGenerator}}

> Implement math functions for Table API
> --
>
> Key: FLINK-3574
> URL: https://issues.apache.org/jira/browse/FLINK-3574
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Dawid Wysakowicz
>
> {code}
> MOD
> EXP
> POWER
> LN
> LOG10
> ABS
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3573) Implement more String functions for Table API

2016-03-08 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-3573.
-
   Resolution: Implemented
Fix Version/s: 1.1.0

> Implement more String functions for Table API
> -
>
> Key: FLINK-3573
> URL: https://issues.apache.org/jira/browse/FLINK-3573
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> Implement the remaining string functions:
> {code}
> CHARACTER_LENGTH
> CONCAT
> UPPER
> LOWER
> INITCAP
> LIKE
> SIMILAR
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3574) Implement math functions for Table API

2016-03-08 Thread Dawid Wysakowicz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184889#comment-15184889
 ] 

Dawid Wysakowicz commented on FLINK-3574:
-

I thought about a logic as described by you in {{getCallGenerator}}, but right 
know we can't specify signature with NUMERIC type cause {{NumericTypeInfo}} is 
an abstract class thus it cannot be instantiated.

Regarding the two ways of processing I meant sth different. Have a look at 
those two code snippets:  

RexNodeTranslator:
{code}
  case UnaryMinus(child) =>
val c = toRexNode(child, relBuilder)
relBuilder.call(SqlStdOperatorTable.UNARY_MINUS, c)

  // Scalar functions
  case Call(name, args@_*) =>
val rexArgs = args.map(toRexNode(_, relBuilder))
val sqlOperator = toSqlOperator(name)
relBuilder.call(sqlOperator, rexArgs)
{code}

CodeGenerator:
{code}
  case OR =>
operands.reduceLeft { (left: GeneratedExpression, right: 
GeneratedExpression) =>
  requireBoolean(left)
  requireBoolean(right)
  generateOr(nullCheck, left, right)
}

  case call: SqlOperator =>
val callGen = ScalarFunctions.getCallGenerator(call, 
operands.map(_.resultType))
callGen
  .getOrElse(throw new CodeGenException(s"Unsupported call: $call"))
  .generate(this, operands)
{code}

In both snippets UnaryMinus and OR could be expressed in the case call block 
using appropriate CallGenerator. My question is should we replace whole match 
block with the invocation to {{getCallGenerator}}

> Implement math functions for Table API
> --
>
> Key: FLINK-3574
> URL: https://issues.apache.org/jira/browse/FLINK-3574
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Dawid Wysakowicz
>
> {code}
> MOD
> EXP
> POWER
> LN
> LOG10
> ABS
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3573) Implement more String functions for Table API

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184884#comment-15184884
 ] 

ASF GitHub Bot commented on FLINK-3573:
---

Github user twalthr closed the pull request at:

https://github.com/apache/flink/pull/1766


> Implement more String functions for Table API
> -
>
> Key: FLINK-3573
> URL: https://issues.apache.org/jira/browse/FLINK-3573
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> Implement the remaining string functions:
> {code}
> CHARACTER_LENGTH
> CONCAT
> UPPER
> LOWER
> INITCAP
> LIKE
> SIMILAR
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3573] [table] Implement more String fun...

2016-03-08 Thread twalthr
Github user twalthr closed the pull request at:

https://github.com/apache/flink/pull/1766


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3573) Implement more String functions for Table API

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184881#comment-15184881
 ] 

ASF GitHub Bot commented on FLINK-3573:
---

Github user twalthr commented on the pull request:

https://github.com/apache/flink/pull/1766#issuecomment-193777848
  
Merging this...


> Implement more String functions for Table API
> -
>
> Key: FLINK-3573
> URL: https://issues.apache.org/jira/browse/FLINK-3573
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> Implement the remaining string functions:
> {code}
> CHARACTER_LENGTH
> CONCAT
> UPPER
> LOWER
> INITCAP
> LIKE
> SIMILAR
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3573] [table] Implement more String fun...

2016-03-08 Thread twalthr
Github user twalthr commented on the pull request:

https://github.com/apache/flink/pull/1766#issuecomment-193777848
  
Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3588) Add a streaming (exactly-once) JDBC connector

2016-03-08 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3588:
---

 Summary: Add a streaming (exactly-once) JDBC connector
 Key: FLINK-3588
 URL: https://issues.apache.org/jira/browse/FLINK-3588
 Project: Flink
  Issue Type: Improvement
  Components: Streaming Connectors
Affects Versions: 1.0.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3574) Implement math functions for Table API

2016-03-08 Thread Timo Walther (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184870#comment-15184870
 ] 

Timo Walther commented on FLINK-3574:
-

Maybe we need some casting/fallback logic in {{getCallGenerator}} to find a 
suitable function: first check entire signature, then check signature with 
casting. {{addSqlFunctionMethod}} just determine which types the final Java 
method needs. The design of the `ScalarFunctions` class is not completely done 
yet as we also want to integrate custom user-defined functions later.

RexNodeTranslator is the interface between the Table API and the Calcite stack. 
However, we will have a SQL API in the near future that won't go through the 
RexNodeTranslator but directly to the CodeGenerator.

> Implement math functions for Table API
> --
>
> Key: FLINK-3574
> URL: https://issues.apache.org/jira/browse/FLINK-3574
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Dawid Wysakowicz
>
> {code}
> MOD
> EXP
> POWER
> LN
> LOG10
> ABS
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3169) Drop {{Key}} type from Record Data Model

2016-03-08 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-3169.
---
Resolution: Fixed

Removed in fe117afd31e823006f680d145c8f75033816ed17

> Drop {{Key}} type from Record Data Model
> 
>
> Key: FLINK-3169
> URL: https://issues.apache.org/jira/browse/FLINK-3169
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Chesnay Schepler
> Fix For: 1.0.0
>
>
> The {{Key}} type is currently still used by
>   - The RecordComparator: This can be moved to the test scope of 
> {{flink-runtime}}
>   - The range partitioner distributions. They are outdated anyways and not 
> used any more.
>   - All value types can be moved to {{NormalizableKey}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >