[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-06-25 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-115133956
  
Thanks, seems like all is fine now. We will start reviewing this in the 
next few days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600798#comment-14600798
 ] 

ASF GitHub Bot commented on FLINK-1731:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-115133956
  
Thanks, seems like all is fine now. We will start reviewing this in the 
next few days.


> Add kMeans clustering algorithm to machine learning library
> ---
>
> Key: FLINK-1731
> URL: https://issues.apache.org/jira/browse/FLINK-1731
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Peter Schrott
>  Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not 
> yet ported to the machine learning library. I assume that only the used data 
> types have to be adapted and then it can be more or less directly moved to 
> flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
> implementation because the improve the initial seeding phase to achieve near 
> optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-25 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/832#issuecomment-115134406
  
Hey Theo, 

Thanks a lot for finding my bug there ^^
PR updated to address the Java issues and to contain a  pimped Scala 
version of `zipWithIndex` :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600805#comment-14600805
 ] 

ASF GitHub Bot commented on FLINK-2152:
---

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/832#issuecomment-115134406
  
Hey Theo, 

Thanks a lot for finding my bug there ^^
PR updated to address the Java issues and to contain a  pimped Scala 
version of `zipWithIndex` :) 


> Provide zipWithIndex utility in flink-contrib
> -
>
> Key: FLINK-2152
> URL: https://issues.apache.org/jira/browse/FLINK-2152
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Robert Metzger
>Assignee: Andra Lungu
>Priority: Trivial
>  Labels: starter
>
> We should provide a simple utility method for zipping elements in a data set 
> with a dense index.
> its up for discussion whether we want it directly in the API or if we should 
> provide it only as a utility from {{flink-contrib}}.
> I would put it in {{flink-contrib}}.
> See my answer on SO: 
> http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2271] [FLINK-1522] [gelly] add missing ...

2015-06-25 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115167456
  
Thanks for adding tests. The changes look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2271) PageRank gives wrong results with weighted graph input

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600890#comment-14600890
 ] 

ASF GitHub Bot commented on FLINK-2271:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115167456
  
Thanks for adding tests. The changes look good to me.


> PageRank gives wrong results with weighted graph input
> --
>
> Key: FLINK-2271
> URL: https://issues.apache.org/jira/browse/FLINK-2271
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 0.10
>
>
> The current implementation of the PageRank algorithm expects a weighted edge 
> list as input. However, if the edge weight is other than 1.0, this will 
> result in wrong results.
> We should change the library method and corresponding examples (also 
> GSAPageRank) to expect an unweighted graph and compute the transition 
> probabilities correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2275] Migrate test from package 'org.ap...

2015-06-25 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/866#issuecomment-115173704
  
I adapted all tests except for DataSinkITCase and 
ExecutionEnvironmentITCase.
 - DataSinkITCase -> seems to test writing to file explicit; would not make 
sense to change it (tell me, if I am wrong)
 - ExecutionEnvironmentITCase ->uses LocalCollectionOutputFormat, and is 
not writing to disc already


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2275) Migrate test from package 'org.apache.flink.test.javaApiOperators'

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600905#comment-14600905
 ] 

ASF GitHub Bot commented on FLINK-2275:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/866#issuecomment-115173704
  
I adapted all tests except for DataSinkITCase and 
ExecutionEnvironmentITCase.
 - DataSinkITCase -> seems to test writing to file explicit; would not make 
sense to change it (tell me, if I am wrong)
 - ExecutionEnvironmentITCase ->uses LocalCollectionOutputFormat, and is 
not writing to disc already


> Migrate test from package 'org.apache.flink.test.javaApiOperators'
> --
>
> Key: FLINK-2275
> URL: https://issues.apache.org/jira/browse/FLINK-2275
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 0.10
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115179641
  
@thvasilo, how should I link this PR to #832 ? Should I fork Andra's repo 
and create a PR there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2161] modified Scala shell start script...

2015-06-25 Thread nikste
Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/805#issuecomment-115180597
  
This did not work unfortunately, the class was available in the test, but 
unfortunately not in the shell which is invoked in the test. 
However, if you add the classpath of the external class to 
```settings.classpath.value``` of the scala shell before starting it, it seems 
to work.

I added a test for instantiating and printing a DenseVector with flink-ml 
jar. This should check if the external jar is sent to the cluster.
The only remaining problem is the name of the jar, which will change if the 
flink-version changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2161) Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML)

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600917#comment-14600917
 ] 

ASF GitHub Bot commented on FLINK-2161:
---

Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/805#issuecomment-115180597
  
This did not work unfortunately, the class was available in the test, but 
unfortunately not in the shell which is invoked in the test. 
However, if you add the classpath of the external class to 
```settings.classpath.value``` of the scala shell before starting it, it seems 
to work.

I added a test for instantiating and printing a DenseVector with flink-ml 
jar. This should check if the external jar is sent to the cluster.
The only remaining problem is the name of the jar, which will change if the 
flink-version changes. 


> Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML)
> --
>
> Key: FLINK-2161
> URL: https://issues.apache.org/jira/browse/FLINK-2161
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Nikolaas Steenbergen
>
> Currently, there is no easy way to load and ship external libraries/jars with 
> Flink's Scala shell. Assume that you want to run some Gelly graph algorithms 
> from within the Scala shell, then you have to put the Gelly jar manually in 
> the lib directory and make sure that this jar is also available on your 
> cluster, because it is not shipped with the user code. 
> It would be good to have a simple mechanism how to specify additional jars 
> upon startup of the Scala shell. These jars should then also be shipped to 
> the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115180791
  
The easiest way is probably to check out her branch or the PR and then 
rebase your work on hers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2105) Implement Sort-Merge Outer Join algorithm

2015-06-25 Thread Ricky Pogalz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600934#comment-14600934
 ] 

Ricky Pogalz commented on FLINK-2105:
-

Hi,

first of all thanks for your answers [~chiwanpark] and [~fhueske]. We have some 
more questions regarding the scope of this ticket.

# Is the implementation of the OperatorBase in the core project also part of 
this ticket or should it be part of the integration?
# Same question for the Driver. Is the integration of the Iterators into the 
Driver part of this ticket?
# Just for understanding. Is it sufficient to integrate the OuterJoinIterators 
in the existing MatchDriver or do we have to create a seperate Driver?

Thanks

> Implement Sort-Merge Outer Join algorithm
> -
>
> Key: FLINK-2105
> URL: https://issues.apache.org/jira/browse/FLINK-2105
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>Assignee: Ricky Pogalz
>Priority: Minor
> Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment. 
> This issue proposes to implement a sort-merge outer join algorithm that can 
> cover left, right, and full outer joins.
> The implementation can be based on the regular sort-merge join iterator 
> ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also 
> {{MatchDriver}} class)
> The Reusing and NonReusing variants differ in whether object instances are 
> reused or new objects are created. I would start with the NonReusing variant 
> which is safer from a user's point of view and should also be easier to 
> implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-06-25 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-115189878
  
Just a correction, the functionality you will need is in #832 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600937#comment-14600937
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-115189878
  
Just a correction, the functionality you will need is in #832 


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2264] [gelly] changed the tests to use ...

2015-06-25 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/863#issuecomment-115194215
  
Thank you @samk3211! This looks good :)

I see that like here, @mjsax has also created a utils class for the new 
comparison methods in #866.
Since all migrated tests will be using these methods, I will just move them 
to `TestBaseUtils` before merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2264) Migrate integration tests for Gelly

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600944#comment-14600944
 ] 

ASF GitHub Bot commented on FLINK-2264:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/863#issuecomment-115194215
  
Thank you @samk3211! This looks good :)

I see that like here, @mjsax has also created a utils class for the new 
comparison methods in #866.
Since all migrated tests will be using these methods, I will just move them 
to `TestBaseUtils` before merging this.


> Migrate integration tests for Gelly
> ---
>
> Key: FLINK-2264
> URL: https://issues.apache.org/jira/browse/FLINK-2264
> Project: Flink
>  Issue Type: Sub-task
>  Components: Gelly, Tests
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Samia Khalid
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

2015-06-25 Thread Sebastian Kruse (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600948#comment-14600948
 ] 

Sebastian Kruse commented on FLINK-2239:


I used the {{RemoteCollectorOutputFormat}} for this purpose and that always 
worked pretty well. However, the downside of it is that it uses Java RMI, which 
is not using Flink's serialization stack and also sometimes requires to set up 
the client address via {{-Djava.rmi.server.hostname}}.

Additionally, I would like to remark that there is a more general issue behind 
this: If one wants to ship larger job results to the driver (e.g., in order to 
write it to a local DB), {{collect()}} also falls flat. Something like an 
{{iterate()}} method would help in such cases, that streams the result to the 
client without materializing it. The proposed change to {{print()}} is then 
just a special instance of such an {{iterate()}} method.

> print() on DataSet: stream results and print incrementally
> --
>
> Key: FLINK-2239
> URL: https://issues.apache.org/jira/browse/FLINK-2239
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: 0.9
>Reporter: Maximilian Michels
> Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2108) Add score function for Predictors

2015-06-25 Thread Theodore Vasiloudis (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Theodore Vasiloudis reassigned FLINK-2108:
--

Assignee: Theodore Vasiloudis  (was: Sachin Goel)

> Add score function for Predictors
> -
>
> Key: FLINK-2108
> URL: https://issues.apache.org/jira/browse/FLINK-2108
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Theodore Vasiloudis
>Priority: Minor
>  Labels: ML
>
> A score function for Predictor implementations should take a DataSet[(I, O)] 
> and an (optional) scoring measure and return a score.
> The DataSet[(I, O)] would probably be the output of the predict function.
> For example in MultipleLinearRegression, we can call predict on a labeled 
> dataset, get back predictions for each item in the data, and then call score 
> with the resulting dataset as an argument and we should get back a score for 
> the prediction quality, such as the R^2 score.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2108) Add score function for Predictors

2015-06-25 Thread Theodore Vasiloudis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600951#comment-14600951
 ] 

Theodore Vasiloudis commented on FLINK-2108:


OK I will take this then, the interface will be similar to what sklearn uses.

> Add score function for Predictors
> -
>
> Key: FLINK-2108
> URL: https://issues.apache.org/jira/browse/FLINK-2108
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> A score function for Predictor implementations should take a DataSet[(I, O)] 
> and an (optional) scoring measure and return a score.
> The DataSet[(I, O)] would probably be the output of the predict function.
> For example in MultipleLinearRegression, we can call predict on a labeled 
> dataset, get back predictions for each item in the data, and then call score 
> with the resulting dataset as an argument and we should get back a score for 
> the prediction quality, such as the R^2 score.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2276) Travis build error

2015-06-25 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2276:
--

 Summary: Travis build error
 Key: FLINK-2276
 URL: https://issues.apache.org/jira/browse/FLINK-2276
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel


testExecutionFailsAfterTaskMarkedFailed on travis. 
Here is the log output: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/68288986/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2116] [ml] Reusing predict operation fo...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/772#issuecomment-115198124
  
Actually I wouldn't call it predictSomething, because then we're again 
quite close to the former problem that we have a method whose semantics depend 
on the provided type. And this only confuses users. 

My concern is that the user does not really know what `predictLabeled` 
means. Apparently it is something similar to `predict` but with a label. But 
what is the label? Does it mean that I can apply `predict` on `T <: Vector` and 
`predictLabeled` on `LabeledVector`? Does it mean that I get a labeled result 
type? But don't I already get it with `predict`? Do I have to provide a type 
with a label or can I also supply a vector? 

IMO, the prediction which also returns the true label value deserves a more 
distinguishable name than `predictSomething`, because it has different 
semantics. I can't think of something better than `evaluate` at the moment. But 
it makes it clear that the user has to provide some evaluation `DataSet`, 
meaning some labeled data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2116) Make pipeline extension require less coding

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600958#comment-14600958
 ] 

ASF GitHub Bot commented on FLINK-2116:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/772#issuecomment-115198124
  
Actually I wouldn't call it predictSomething, because then we're again 
quite close to the former problem that we have a method whose semantics depend 
on the provided type. And this only confuses users. 

My concern is that the user does not really know what `predictLabeled` 
means. Apparently it is something similar to `predict` but with a label. But 
what is the label? Does it mean that I can apply `predict` on `T <: Vector` and 
`predictLabeled` on `LabeledVector`? Does it mean that I get a labeled result 
type? But don't I already get it with `predict`? Do I have to provide a type 
with a label or can I also supply a vector? 

IMO, the prediction which also returns the true label value deserves a more 
distinguishable name than `predictSomething`, because it has different 
semantics. I can't think of something better than `evaluate` at the moment. But 
it makes it clear that the user has to provide some evaluation `DataSet`, 
meaning some labeled data.


> Make pipeline extension require less coding
> ---
>
> Key: FLINK-2116
> URL: https://issues.apache.org/jira/browse/FLINK-2116
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Mikio Braun
>Assignee: Till Rohrmann
>Priority: Minor
>
> Right now, implementing methods from the pipelines for new types, or even 
> adding new methods to pipelines requires many steps:
> 1) implementing methods for new types
>   implement implicit of the corresponding class encapsulating the operation 
> in the companion object
> 2) adding methods to the pipeline
>   - adding a method
>   - adding a trait for the operation
>   - implement implicit in the companion object
> These are all objects which contain many generic parameters, so reducing the 
> work would be great.
> The goal should be that you can really focus on the code to add, and have as 
> little boilerplate code as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115199291
  
Where should I place the Histogram implementations? Currently, they are in 
{{org.apache.flink.ml.math}}, but I can't import them from the flink-core where 
the DataSetUtils is located. Besides, since the purpose is to make the 
Histograms usable in general, they shouldn't be in the ml library.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115202017
  
For the moment, I think it's best to place it under 
`org.apache.flink.ml.density`, for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115204540
  
 How should I import a class in flink.ml.math from say, flink-java? I tried 
adding flink-staging as a dependency to pom.xml of flink-java but to no avail. 
I'm not terribly familiar with maven.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115207592
  
You don't do it. I think it's best at the moment to only make the 
histograms available within the ml package. Everyone who wants to use them, can 
then add `flink-ml` as a dependency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115210476
  
Okay. So I guess we can leave adding a createHistogram function to 
DataSetUtils for now [It would also require utilizing the FlinkMLTools.block 
for an efficient implementation]. Pending that, this PR is ready to merge then. 
Please have a look for any other modifications that are needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

2015-06-25 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/818#issuecomment-115215460
  
Thank you @shghatge! I'll merge this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2093) Add a difference method to Gelly's Graph class

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601031#comment-14601031
 ] 

ASF GitHub Bot commented on FLINK-2093:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/818#issuecomment-115215460
  
Thank you @shghatge! I'll merge this :)


> Add a difference method to Gelly's Graph class
> --
>
> Key: FLINK-2093
> URL: https://issues.apache.org/jira/browse/FLINK-2093
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Shivani Ghatge
>Priority: Minor
>
> This method will compute the difference between two graphs, returning a new 
> graph containing the vertices and edges that the current graph and the input 
> graph don't have in common. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2255) In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601038#comment-14601038
 ] 

ASF GitHub Bot commented on FLINK-2255:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/857


> In the TopSpeedWindowing examples, every window contains only 1 element, 
> because event time is in millisec, but eviction is in sec
> --
>
> Key: FLINK-2255
> URL: https://issues.apache.org/jira/browse/FLINK-2255
> Project: Flink
>  Issue Type: Bug
>  Components: Examples, Streaming
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Minor
>
> The event times are generated by System.currentTimeMillis(), so evictionSec 
> should be multiplied by 1000, when passing it to Time.of.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1956) Runtime context not initialized in RichWindowMapFunction

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601037#comment-14601037
 ] 

ASF GitHub Bot commented on FLINK-1956:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/855


> Runtime context not initialized in RichWindowMapFunction
> 
>
> Key: FLINK-1956
> URL: https://issues.apache.org/jira/browse/FLINK-1956
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Daniel Bali
>Assignee: Márton Balassi
>  Labels: context, runtime, streaming, window
> Fix For: 0.9
>
>
> Trying to access the runtime context in a rich window map function results in 
> an exception. The following snippet demonstrates the bug:
> {code}
> env.generateSequence(0, 1000)
> .window(Count.of(10))
> .mapWindow(new RichWindowMapFunction>() {
> @Override
> public void mapWindow(Iterable input, Collector Long>> out) throws Exception {
> long self = getRuntimeContext().getIndexOfThisSubtask();
> for (long value : input) {
> out.collect(new Tuple2<>(self, value));
> }
> }
> }).flatten().print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2255] [streaming] Fixed a bug in TopSpe...

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/857


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [streaming] Properly forward rich window funct...

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/855


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2255) In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec

2015-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-2255.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed via af05b94d0d

> In the TopSpeedWindowing examples, every window contains only 1 element, 
> because event time is in millisec, but eviction is in sec
> --
>
> Key: FLINK-2255
> URL: https://issues.apache.org/jira/browse/FLINK-2255
> Project: Flink
>  Issue Type: Bug
>  Components: Examples, Streaming
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Minor
> Fix For: 0.10
>
>
> The event times are generated by System.currentTimeMillis(), so evictionSec 
> should be multiplied by 1000, when passing it to Time.of.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601062#comment-14601062
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

GitHub user Shiti opened a pull request:

https://github.com/apache/flink/pull/867

[FLINK-2230] handling null values for TupleSerializer

When serializing, we add a `byte[] (BitSet.toByteArray)` before the fields 
which indicates `null` fields. 

When deserializing, we fetch the `byte[]` and obtain the underlying `BitSet 
(BitSet.valueOf)`. `BitSet.get(index)` is `false` when the value is `null`. For 
each field element we check if the its marked as `null` in the `BitSet` and 
then pass it to the fieldSerializer.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shiti/flink FLINK-2230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #867


commit a4d2731eb75032f958e96323a075eb8bc7d11c73
Author: Shiti 
Date:   2015-06-25T10:36:10Z

[FLINK-2230]handling null values for TupleSerializer




> Add Support for Null-Values in TupleSerializer
> --
>
> Key: FLINK-2230
> URL: https://issues.apache.org/jira/browse/FLINK-2230
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Shiti Saxena
>Assignee: Shiti Saxena
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread Shiti
GitHub user Shiti opened a pull request:

https://github.com/apache/flink/pull/867

[FLINK-2230] handling null values for TupleSerializer

When serializing, we add a `byte[] (BitSet.toByteArray)` before the fields 
which indicates `null` fields. 

When deserializing, we fetch the `byte[]` and obtain the underlying `BitSet 
(BitSet.valueOf)`. `BitSet.get(index)` is `false` when the value is `null`. For 
each field element we check if the its marked as `null` in the `BitSet` and 
then pass it to the fieldSerializer.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shiti/flink FLINK-2230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #867


commit a4d2731eb75032f958e96323a075eb8bc7d11c73
Author: Shiti 
Date:   2015-06-25T10:36:10Z

[FLINK-2230]handling null values for TupleSerializer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601092#comment-14601092
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115227753
  
We actually had a discussion about this quite a few times. I also raised my 
concerns in the discussion of the issue, to which no one reacted. 

The serialization subsystem (and tuples) are of the most critical nature in 
Flink. There are so many side effects and considerations. Comparators that 
interact with serializers, normalized keys, subclasses and tagging, object 
creation (GC impact). None of that is taken into account here.

For something as crucial as this, we cannot make changes without being 
discussed thoroughly before, and at best, also documented. It makes sense to do 
this before the code writing.

Sorry if I appear like the bad guy here. But we are at the verge of getting 
into spaghetti code and inconsistencies in one of the most crucial parts, and 
we cannot do that.


> Add Support for Null-Values in TupleSerializer
> --
>
> Key: FLINK-2230
> URL: https://issues.apache.org/jira/browse/FLINK-2230
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Shiti Saxena
>Assignee: Shiti Saxena
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115227753
  
We actually had a discussion about this quite a few times. I also raised my 
concerns in the discussion of the issue, to which no one reacted. 

The serialization subsystem (and tuples) are of the most critical nature in 
Flink. There are so many side effects and considerations. Comparators that 
interact with serializers, normalized keys, subclasses and tagging, object 
creation (GC impact). None of that is taken into account here.

For something as crucial as this, we cannot make changes without being 
discussed thoroughly before, and at best, also documented. It makes sense to do 
this before the code writing.

Sorry if I appear like the bad guy here. But we are at the verge of getting 
into spaghetti code and inconsistencies in one of the most crucial parts, and 
we cannot do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601093#comment-14601093
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115228662
  
Also, apparently no tests were ever run after these changes. All fail on 
the build server on basic checkstyle rules even.


> Add Support for Null-Values in TupleSerializer
> --
>
> Key: FLINK-2230
> URL: https://issues.apache.org/jira/browse/FLINK-2230
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Shiti Saxena
>Assignee: Shiti Saxena
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115228662
  
Also, apparently no tests were ever run after these changes. All fail on 
the build server on basic checkstyle rules even.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [tools] Make release script a bit more flexibl...

2015-06-25 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/868

[tools] Make release script a bit more flexible



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink release_script

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/868.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #868


commit 1cefd8abd3a0527f380f22db0baae6bebb2a952f
Author: Robert Metzger 
Date:   2015-06-25T12:58:08Z

[tools] Make release script a bit more flexible




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115248743
  
@StephanEwen hinted that the best way to go would be to decuple the 
RowTypeInfo completely from the TupleTypeInfo/TupleSerializerBase. This way, we 
get null-value support in the Table API without changing the existing code for 
tuples. This would require changing the RowTypeInfo to no longer be a child of 
CaseClassTypeInfo and creating non-subclass RowSerializer and RowComparator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601127#comment-14601127
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115248743
  
@StephanEwen hinted that the best way to go would be to decuple the 
RowTypeInfo completely from the TupleTypeInfo/TupleSerializerBase. This way, we 
get null-value support in the Table API without changing the existing code for 
tuples. This would require changing the RowTypeInfo to no longer be a child of 
CaseClassTypeInfo and creating non-subclass RowSerializer and RowComparator.


> Add Support for Null-Values in TupleSerializer
> --
>
> Key: FLINK-2230
> URL: https://issues.apache.org/jira/browse/FLINK-2230
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Shiti Saxena
>Assignee: Shiti Saxena
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2271) PageRank gives wrong results with weighted graph input

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601137#comment-14601137
 ] 

ASF GitHub Bot commented on FLINK-2271:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115252595
  
Thank you for looking at this @mxm :) I'll merge.


> PageRank gives wrong results with weighted graph input
> --
>
> Key: FLINK-2271
> URL: https://issues.apache.org/jira/browse/FLINK-2271
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 0.10
>
>
> The current implementation of the PageRank algorithm expects a weighted edge 
> list as input. However, if the edge weight is other than 1.0, this will 
> result in wrong results.
> We should change the library method and corresponding examples (also 
> GSAPageRank) to expect an unweighted graph and compute the transition 
> probabilities correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2271] [FLINK-1522] [gelly] add missing ...

2015-06-25 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115252595
  
Thank you for looking at this @mxm :) I'll merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33253164
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

Yeah, I think this is not very nice to do. Every level of wrapping just 
makes the exceptions more horrible and the exception messages worse.

This is an indicator that the signature of `updateState()` simply misses an 
exception.


---
If your project is set up for it, you can reply to this email and

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33253332
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

It is a good idea to start adding these exceptions to the signatures, and 
use this point to start here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab

[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

2015-06-25 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601159#comment-14601159
 ] 

Stephan Ewen commented on FLINK-2239:
-

There is pending work to support larger results for {{collect()}} by letting 
them go through the BLOB manager. That is still limited by client memory, 
though.

The concern about direct connections between client and workers is that this 
fails in many enterprise setups due to firewalls. We have seen multiple 
installations with "edge servers". The client can communicate with the master, 
but not the workers.

I like the idea of {{iterate()}}. Would be a bit of an effort, but seems like a 
clean solution.

> print() on DataSet: stream results and print incrementally
> --
>
> Key: FLINK-2239
> URL: https://issues.apache.org/jira/browse/FLINK-2239
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: 0.9
>Reporter: Maximilian Michels
> Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [tools] Make release script a bit more flexibl...

2015-06-25 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/868#issuecomment-115262360
  
Good changes +1

There are some assumptions about the call order of the newly introduced 
functions though (like you have to call prepare make_src_release [or be in the 
checked out repo] in that order). I guess it's fine, we don't want to over 
engineer this ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33256890
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

In this case the exception caught is thrown by the keyselector, which would 
have thrown the same exception in the partitioner at the previous output 
anyways. 

There is no reason for propagating this exception here.


---
If your project is set up for it, you can reply to this email and have your
reply a

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/747#issuecomment-115268784
  
Should we merge this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33258578
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

If the state is actually residing outside Flink, the updates can fail as 
well. If we want to permit this, an (I/O) exception in the signature would make 
sense anyways.

In this case, wrapping an exception may make sense, but why not add a 
message that explains what is happening? Something like "Use

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33259296
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

Then I guess the getState method should throw an IOException as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infra

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33259429
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

If we want to keep this open, then yes. Really depends on that decision.

On the other hand, removing exceptions usually does not break code. Adding 
them does...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not 

[jira] [Commented] (FLINK-2105) Implement Sort-Merge Outer Join algorithm

2015-06-25 Thread Chiwan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601254#comment-14601254
 ] 

Chiwan Park commented on FLINK-2105:


Hi [~r-pogalz],

I think that this issue covers only implementation of Iterators not 
integration. FLINK-687 should cover the integration with Drivers and optimizers.

We need a new Driver because outer-join returns a different result from 
equi-join (MatchDriver). But the Driver is not for sort-merge based outer-join 
only. Hash-based outer-join will use the same Driver. If I understand 
correctly, A Driver returns a same result although the strategy is different.

> Implement Sort-Merge Outer Join algorithm
> -
>
> Key: FLINK-2105
> URL: https://issues.apache.org/jira/browse/FLINK-2105
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>Assignee: Ricky Pogalz
>Priority: Minor
> Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment. 
> This issue proposes to implement a sort-merge outer join algorithm that can 
> cover left, right, and full outer joins.
> The implementation can be based on the regular sort-merge join iterator 
> ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also 
> {{MatchDriver}} class)
> The Reusing and NonReusing variants differ in whether object instances are 
> reused or new objects are created. I would start with the NonReusing variant 
> which is safer from a user's point of view and should also be easier to 
> implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33259615
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param 
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param 
+ *Type of the underlying {@link OperatorState}.
+ * @param 
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorState 
extends
+   StreamOperatorState {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelector keySelector;
+
+   private final PartitionedStateStore stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointer 
checkpointer,
+   StateHandleProvider provider, KeySelector keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStore(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings("unchecked")
+   public PartitionedStreamOperatorState(StateHandleProvider provider,
+   KeySelector keySelector) {
+   this((StateCheckpointer) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException("Cannot set state to null.");
+   }
+   if (currentInput == null) {
+   throw new RuntimeException("Need a valid input for 
updating a state.");
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

Okay, good point :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a 

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/747#issuecomment-115275285
  
I think we have no real blocker here. I would prefer the exception issue 
could be addressed (message for wrapping exception).

Everything else will probably show best when we implement sample jobs and 
sample backends for this new functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/747#issuecomment-115277122
  
Okay I will fix the exceptions and will merge it afterwards


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2066) Make delay between execution retries configurable

2015-06-25 Thread Nuno Miguel Marques dos Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601529#comment-14601529
 ] 

Nuno Miguel Marques dos Santos commented on FLINK-2066:
---

Hi guys.

I am going to start working on this issue.

Any questions I'll be sure to give a shout in the mailing list!

> Make delay between execution retries configurable
> -
>
> Key: FLINK-2066
> URL: https://issues.apache.org/jira/browse/FLINK-2066
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>  Labels: starter
>
> Flink allows to specify a delay between execution retries. This helps to let 
> some external failure causes fully manifest themselves before the restart is 
> attempted.
> The delay is currently defined only system wide.
> We should add it to the {{ExecutionConfig}} of a job to allow per-job 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2232) StormWordCountLocalITCase fails

2015-06-25 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved FLINK-2232.

Resolution: Fixed

> StormWordCountLocalITCase fails
> ---
>
> Key: FLINK-2232
> URL: https://issues.apache.org/jira/browse/FLINK-2232
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Tests
>Affects Versions: master
>Reporter: Ufuk Celebi
>Assignee: Matthias J. Sax
>Priority: Minor
>
> https://travis-ci.org/apache/flink/jobs/66936476
>  
> {code}
> StormWordCountLocalITCase>StreamingProgramTestBase.testJobWithoutObjectReuse:109->postSubmit:40->TestBaseUtils.compareResultsByLinesInMemory:256->TestBaseUtils.compareResultsByLinesInMemory:270
>  Different number of lines in expected and obtained result. expected:<801> 
> but was:<0>
> {code}
> Can we disable the test until this is fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [tools] Make release script a bit more flexibl...

2015-06-25 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/868#issuecomment-115366539
  
HI @rmetzger, could you summarize the intention of the PR here? Like what 
is the final goal of the changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2264] [gelly] changed the tests to use ...

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/863


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2093) Add a difference method to Gelly's Graph class

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601971#comment-14601971
 ] 

ASF GitHub Bot commented on FLINK-2093:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/818


> Add a difference method to Gelly's Graph class
> --
>
> Key: FLINK-2093
> URL: https://issues.apache.org/jira/browse/FLINK-2093
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Shivani Ghatge
>Priority: Minor
>
> This method will compute the difference between two graphs, returning a new 
> graph containing the vertices and edges that the current graph and the input 
> graph don't have in common. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2264) Migrate integration tests for Gelly

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601972#comment-14601972
 ] 

ASF GitHub Bot commented on FLINK-2264:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/863


> Migrate integration tests for Gelly
> ---
>
> Key: FLINK-2264
> URL: https://issues.apache.org/jira/browse/FLINK-2264
> Project: Flink
>  Issue Type: Sub-task
>  Components: Gelly, Tests
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Samia Khalid
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2264) Migrate integration tests for Gelly

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2264.
--
   Resolution: Fixed
Fix Version/s: 0.10

Congrats on your first contribution [~Samia]!

> Migrate integration tests for Gelly
> ---
>
> Key: FLINK-2264
> URL: https://issues.apache.org/jira/browse/FLINK-2264
> Project: Flink
>  Issue Type: Sub-task
>  Components: Gelly, Tests
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Samia Khalid
>Priority: Minor
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2093) Add a difference method to Gelly's Graph class

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2093.
--
   Resolution: Implemented
Fix Version/s: 0.10

Congrats on your first contribution [~shivani94] :))

> Add a difference method to Gelly's Graph class
> --
>
> Key: FLINK-2093
> URL: https://issues.apache.org/jira/browse/FLINK-2093
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Shivani Ghatge
>Priority: Minor
> Fix For: 0.10
>
>
> This method will compute the difference between two graphs, returning a new 
> graph containing the vertices and edges that the current graph and the input 
> graph don't have in common. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1522) Add tests for the library methods and examples

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-1522.
--
   Resolution: Fixed
Fix Version/s: 0.10

> Add tests for the library methods and examples
> --
>
> Key: FLINK-1522
> URL: https://issues.apache.org/jira/browse/FLINK-1522
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: easyfix, test
> Fix For: 0.10
>
>
> The current tests in gelly test one method at a time. We should have some 
> tests for complete applications. As a start, we could add one test case per 
> example and this way also make sure that our graph library methods actually 
> give correct results.
> I'm assigning this to [~andralungu] because she has already implemented the 
> test for SSSP, but I will help as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2163) VertexCentricConfigurationITCase sometimes fails on Travis

2015-06-25 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602000#comment-14602000
 ] 

Vasia Kalavri commented on FLINK-2163:
--

This test has now been changed to use collect(). Can we assume that this issue 
is now resolved?

> VertexCentricConfigurationITCase sometimes fails on Travis
> --
>
> Key: FLINK-2163
> URL: https://issues.apache.org/jira/browse/FLINK-2163
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Aljoscha Krettek
>
> This is the relevant output from the log:
> {code}
> testIterationINDirection[Execution mode = 
> CLUSTER](org.apache.flink.graph.test.VertexCentricConfigurationITCase)  Time 
> elapsed: 0.587 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<5> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:270)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:256)
>   at 
> org.apache.flink.graph.test.VertexCentricConfigurationITCase.after(VertexCentricConfigurationITCase.java:70)
> Results :
> Failed tests: 
>   
> VertexCentricConfigurationITCase.after:70->TestBaseUtils.compareResultsByLinesInMemory:256->TestBaseUtils.compareResultsByLinesInMemory:270
>  Different number of lines in expected and obtained result. expected:<5> but 
> was:<2>
> {code}
> https://travis-ci.org/aljoscha/flink/jobs/65403502



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2271) PageRank gives wrong results with weighted graph input

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2271.
--
Resolution: Fixed

> PageRank gives wrong results with weighted graph input
> --
>
> Key: FLINK-2271
> URL: https://issues.apache.org/jira/browse/FLINK-2271
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 0.10
>
>
> The current implementation of the PageRank algorithm expects a weighted edge 
> list as input. However, if the edge weight is other than 1.0, this will 
> result in wrong results.
> We should change the library method and corresponding examples (also 
> GSAPageRank) to expect an unweighted graph and compute the transition 
> probabilities correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)