date:20150602


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568786#comment-14568786
 ] 

Aljoscha Krettek commented on FLINK-2069:
-

Yes, you are right but I think in the `CsvOutputFormat` the buffer is just used 
because all the fields and field separates are written to the output stream 
separately.

It still doesn't fix the bug that [~fobeligi] reported, however.

 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107875574
  
While merging realized, that it is missing from the windowed and connected 
datastreams and the from the scala API. Also adding those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568664#comment-14568664
 ] 

ASF GitHub Bot commented on FLINK-1731:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-107831123
  
Hey guys. You might wanna look at the initialization schemes here: 
https://github.com/apache/flink/pull/757


 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2132) Java version parsing is not working for OpenJDK


[ 
https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568805#comment-14568805
 ] 

Ufuk Celebi commented on FLINK-2132:


Openjdk 7 version string looks as expected:
{code}
java version 1.7.0_75
OpenJDK Runtime Environment (IcedTea 2.5.4) (Arch Linux build 
7.u75_2.5.4-1-x86_64)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
{code}

 Java version parsing is not working for OpenJDK
 ---

 Key: FLINK-2132
 URL: https://issues.apache.org/jira/browse/FLINK-2132
 Project: Flink
  Issue Type: Bug
  Components: Start-Stop Scripts
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Critical

 Reported by [~aljoscha]:
 {code}
 /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk
 version 1.8.0_40-internal: integer expression expected
 {code}
 On Ubuntu 14.10 with openjdk 8.
 The script expects a String of format
 {code}
 java version 1.8.0_20
 Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
 {code}
 but the openjdk string looks as follows
 {code}
 openjdk version 1.8.0_40-internal
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b09)
 OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-2132) Java version parsing is not working for OpenJDK


 [ 
https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-2132:
--

Assignee: Ufuk Celebi

 Java version parsing is not working for OpenJDK
 ---

 Key: FLINK-2132
 URL: https://issues.apache.org/jira/browse/FLINK-2132
 Project: Flink
  Issue Type: Bug
  Components: Start-Stop Scripts
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Critical

 Reported by [~aljoscha]:
 {code}
 /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk
 version 1.8.0_40-internal: integer expression expected
 {code}
 On Ubuntu 14.10 with openjdk 8.
 The script expects a String of format
 {code}
 java version 1.8.0_20
 Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
 {code}
 but the openjdk string looks as follows
 {code}
 openjdk version 1.8.0_40-internal
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b09)
 OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107853134
  
Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2080) Execute Flink with sbt


 [ 
https://issues.apache.org/jira/browse/FLINK-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2080:
--
Component/s: Documentation

 Execute Flink with sbt
 --

 Key: FLINK-2080
 URL: https://issues.apache.org/jira/browse/FLINK-2080
 Project: Flink
  Issue Type: Improvement
  Components: docs, Documentation
Affects Versions: 0.8.1
Reporter: Christian Wuertz
Priority: Minor

 I tried to execute some of the flink example applications on my local machine 
 using sbt. To get this running without class loading issues it was important 
 to make sure that Flink is executed in its own JVM and not in the sbt JVM. 
 This can be done very easily, but it would have been nice to know that in 
 advance. So maybe you guys want to add this to the Flink documentation.
 An example can be found here: https://github.com/Teots/flink-sbt
 (The trick was to add fork in run := true to the build.sbt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2102] [ml] Add predict operation for La...

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/744#issuecomment-107897490
  
LGTM. I will merge it as a temporary solution for the manual evaluation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-06-02 Thread sachingoel0101

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-107831123
  
Hey guys. You might wanna look at the initialization schemes here: 
https://github.com/apache/flink/pull/757


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2132) Java version parsing is not working for OpenJDK


 [ 
https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2132:
---
Fix Version/s: 0.9

 Java version parsing is not working for OpenJDK
 ---

 Key: FLINK-2132
 URL: https://issues.apache.org/jira/browse/FLINK-2132
 Project: Flink
  Issue Type: Bug
  Components: Start-Stop Scripts
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Critical
 Fix For: 0.9


 Reported by [~aljoscha]:
 {code}
 /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk
 version 1.8.0_40-internal: integer expression expected
 {code}
 On Ubuntu 14.10 with openjdk 8.
 The script expects a String of format
 {code}
 java version 1.8.0_20
 Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
 {code}
 but the openjdk string looks as follows
 {code}
 openjdk version 1.8.0_40-internal
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b09)
 OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API


[ 
https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568754#comment-14568754
 ] 

ASF GitHub Bot commented on FLINK-2103:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107854661
  
Good point, will do that anyway because of the renaming discussed on the 
mailing list.


 Expose partitionBy to the user in Stream API
 

 Key: FLINK-2103
 URL: https://issues.apache.org/jira/browse/FLINK-2103
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 Is there a reason why this is not exposed to the user? I could see cases 
 where this would be useful to have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107854661
  
Good point, will do that anyway because of the renaming discussed on the 
mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2132) Java version parsing is not working for OpenJDK

Ufuk Celebi created FLINK-2132:
--

 Summary: Java version parsing is not working for OpenJDK
 Key: FLINK-2132
 URL: https://issues.apache.org/jira/browse/FLINK-2132
 Project: Flink
  Issue Type: Bug
  Components: Start-Stop Scripts
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Critical


Reported by [~aljoscha]:

{code}
/home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk
version 1.8.0_40-internal: integer expression expected
{code}

On Ubuntu 14.10 with openjdk 8.

The script expects a String of format
{code}
java version 1.8.0_20
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
{code}
but the openjdk string looks as follows

{code}
openjdk version 1.8.0_40-internal
OpenJDK Runtime Environment (build 1.8.0_40-internal-b09)
OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2102) Add predict operation for LabeledVector


[ 
https://issues.apache.org/jira/browse/FLINK-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568850#comment-14568850
 ] 

ASF GitHub Bot commented on FLINK-2102:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/744#issuecomment-107897490
  
LGTM. I will merge it as a temporary solution for the manual evaluation.


 Add predict operation for LabeledVector
 ---

 Key: FLINK-2102
 URL: https://issues.apache.org/jira/browse/FLINK-2102
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 Currently we can only call predict on DataSet[V : Vector].
 A lot of times though we have a DataSet[LabeledVector] that we split into a 
 train and test set.
 We should be able to make predictions on the test DataSet[LabeledVector] 
 without having to transform it into a DataSet[Vector]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API


[ 
https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568750#comment-14568750
 ] 

ASF GitHub Bot commented on FLINK-2103:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107853775
  
We should first update the documentation.


 Expose partitionBy to the user in Stream API
 

 Key: FLINK-2103
 URL: https://issues.apache.org/jira/browse/FLINK-2103
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 Is there a reason why this is not exposed to the user? I could see cases 
 where this would be useful to have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2103] Expose partitionBy to user

2015-06-02 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107853775
  
We should first update the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1884) Outdated Spargel docs


 [ 
https://issues.apache.org/jira/browse/FLINK-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1884:
--
Component/s: Documentation

 Outdated Spargel docs
 -

 Key: FLINK-1884
 URL: https://issues.apache.org/jira/browse/FLINK-1884
 Project: Flink
  Issue Type: Bug
  Components: docs, Documentation, Spargel
Reporter: Vasia Kalavri

 It seems like the example in the Spargel guide hasn't been updated for the 
 past few versions.
 The example code uses a {{SpargelIteration}}, {{FileDataSource}}, 
 {{FileDataSink}} and creates a {{Plan}}.
 We could either update the example there or, since we're going to deprecate 
 Spargel, we could remove the example from the guide and add the correct 
 version in the Spargel-to-Gelly migration guide (FLINK-1871).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API


[ 
https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568742#comment-14568742
 ] 

ASF GitHub Bot commented on FLINK-2103:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107853134
  
Merging.


 Expose partitionBy to the user in Stream API
 

 Key: FLINK-2103
 URL: https://issues.apache.org/jira/browse/FLINK-2103
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 Is there a reason why this is not exposed to the user? I could see cases 
 where this would be useful to have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2103) Expose partitionBy to the user in Stream API


[ 
https://issues.apache.org/jira/browse/FLINK-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568812#comment-14568812
 ] 

ASF GitHub Bot commented on FLINK-2103:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/743#issuecomment-107875574
  
While merging realized, that it is missing from the windowed and connected 
datastreams and the from the scala API. Also adding those.


 Expose partitionBy to the user in Stream API
 

 Key: FLINK-2103
 URL: https://issues.apache.org/jira/browse/FLINK-2103
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 Is there a reason why this is not exposed to the user? I could see cases 
 where this would be useful to have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2132) Java version parsing is not working for OpenJDK

2015-06-02 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568825#comment-14568825
 ] 

Matthias J. Sax commented on FLINK-2132:


Is anyone working on this? I could take care of it.

 Java version parsing is not working for OpenJDK
 ---

 Key: FLINK-2132
 URL: https://issues.apache.org/jira/browse/FLINK-2132
 Project: Flink
  Issue Type: Bug
  Components: Start-Stop Scripts
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Critical

 Reported by [~aljoscha]:
 {code}
 /home/flink/flink-bin/bin/../bin/jobmanager.sh: line 32: [: openjdk
 version 1.8.0_40-internal: integer expression expected
 {code}
 On Ubuntu 14.10 with openjdk 8.
 The script expects a String of format
 {code}
 java version 1.8.0_20
 Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
 {code}
 but the openjdk string looks as follows
 {code}
 openjdk version 1.8.0_40-internal
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b09)
 OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting


[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568766#comment-14568766
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107860292
  
I addressed the problem with the race conditions and re-enabled the twitter 
source.

Which exceptions are you referring to? I don't think I touched any of the 
exception handling or the general way that the steam tasks work.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-02 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107860292
  
I addressed the problem with the race conditions and re-enabled the twitter 
source.

Which exceptions are you referring to? I don't think I touched any of the 
exception handling or the general way that the steam tasks work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-02 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-2127.
---
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Maximilian Michels

Fixed by removing the {{br}} tag.

 The GSA Documentation has trailing /p s
 -

 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 0.9


 Within the GSA Section of the documentation, there are trailing: p 
 class=text-center image /p. 
 It would be nice to remove them :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568847#comment-14568847
 ] 

Aljoscha Krettek commented on FLINK-2069:
-

Could you maybe share your code? I now have this example code and it still 
produces output:

{code}
DataStreamString text = env.socketTextStream(localhost, );

IterativeDataStreamTuple2String, Integer iterate = text.flatMap(new 
WordCount.Tokenizer()).iterate();

SplitDataStreamTuple2String, Integer iterEnd = iterate.map(new 
MapFunctionTuple2String, Integer, Tuple2String, Integer() {
private static final long serialVersionUID = 1L;

@Override
public Tuple2String, Integer map(Tuple2String, Integer value) throws 
Exception {
return new Tuple2String, Integer(value.f0, value.f1 + 1);
}
}).split(new OutputSelectorTuple2String, Integer() {
private static final long serialVersionUID = 1L;

@Override
public IterableString select(Tuple2String, Integer value) {
if (value.f1  10) {
return Lists.newArrayList(iter);
} else {
return Lists.newArrayList(end);
}
}
});

iterate.closeWith(iterEnd.select(iter));

iterEnd.select(end).writeAsCsv(/Users/aljoscha/Downloads/wc-csv-out, 
FileSystem.WriteMode.OVERWRITE).setParallelism(1);
{code}

Are you using a timeout in your {{iterate()}} call?

 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31559688
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -21,10 +21,16 @@
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 
+import org.apache.commons.lang3.Validate;
--- End diff --

I'm really sorry that you ran into this, but the community recently decided 
to use Guava's Preconditions.check() instead of commons lang.
Can you replace that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1981) Add GZip support


[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569601#comment-14569601
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31560285
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

so if there is no inflater input stream available, it will just fall back 
to the compressed data stream?
Wouldn't it better to at least log something or fail?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-02 Thread sekruse

Github user sekruse commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562256
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

It might also be the case that the stream was not compressed at all. It 
would of course be nice to react appropriately to a missing codec, but how 
would we know if the current input split belongs to an uncompressed file or a 
compressed file with an unknown codec?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1981) Add GZip support


[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569625#comment-14569625
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user sekruse commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562256
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

It might also be the case that the stream was not compressed at all. It 
would of course be nice to react appropriately to a missing codec, but how 
would we know if the current input split belongs to an uncompressed file or a 
compressed file with an unknown codec?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [docs/javadoc][hotfix] Corrected Join hint and...

2015-06-02 Thread andralungu

GitHub user andralungu opened a pull request:

https://github.com/apache/flink/pull/763

[docs/javadoc][hotfix] Corrected Join hint and misleadling pointer to 
Spargel

This PR adds the following patches: 
- in the Iteration programming guide, there was a misleading link to 
Spargel, which is deprecated; I switched it to point to Gelly;
- while trying to use join hints, I saw that the JavaDoc for 
BROADCAST_HASH_SECOND was wrong; Hint that the *second* join input is much 
smaller than the *second*. Fixed it to be smaller than the first :) 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andralungu/flink documPatch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #763


commit 955ed6e3106d32b22933f0ce637bf38fa2faef72
Author: andralungu lungu.an...@gmail.com
Date:   2015-06-02T18:58:10Z

[docs/javadoc][hotfix] Corrected Join hint and misleadling pointer to 
Spargel




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1981) Add GZip support


[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569638#comment-14569638
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562955
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

Ah, okay, I see. I didn't read the code closely enough.



 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1981) Add GZip support


[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569589#comment-14569589
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31559688
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -21,10 +21,16 @@
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 
+import org.apache.commons.lang3.Validate;
--- End diff --

I'm really sorry that you ran into this, but the community recently decided 
to use Guava's Preconditions.check() instead of commons lang.
Can you replace that?



 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562955
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

Ah, okay, I see. I didn't read the code closely enough.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569675#comment-14569675
 ] 

ASF GitHub Bot commented on FLINK-2069:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/759#issuecomment-108083643
  
Are there any tests for the streaming output formats?


 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/759#issuecomment-108083643
  
Are there any tests for the streaming output formats?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569669#comment-14569669
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108081964
  
How about extending the `UDF contains obvious errors` message with some 
notes on how to completely disable the SCA.
I fear that the message appears (blocks the program execution) due to a bug 
in the SCA and then users don't know how to get their stuff to run.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108081964
  
How about extending the `UDF contains obvious errors` message with some 
notes on how to completely disable the SCA.
I fear that the message appears (blocks the program execution) due to a bug 
in the SCA and then users don't know how to get their stuff to run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108093699
  
I agree with Robert, but for the initial version it won't matter as it 
should be disabled anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/759#issuecomment-108093780
  
Not enough. `writeAsText` is covered a number of times in the integration 
tests and ITCases, but there is no test for other output formats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569725#comment-14569725
 ] 

ASF GitHub Bot commented on FLINK-2069:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/759#issuecomment-108093780
  
Not enough. `writeAsText` is covered a number of times in the integration 
tests and ITCases, but there is no test for other output formats.


 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569723#comment-14569723
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108093699
  
I agree with Robert, but for the initial version it won't matter as it 
should be disabled anyways.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-2108) Add score function for Predictors

2015-06-02 Thread Sachin Goel (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel reassigned FLINK-2108:
--

Assignee: Sachin Goel

 Add score function for Predictors
 -

 Key: FLINK-2108
 URL: https://issues.apache.org/jira/browse/FLINK-2108
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Sachin Goel
Priority: Minor
  Labels: ML

 A score function for Predictor implementations should take a DataSet[(I, O)] 
 and an (optional) scoring measure and return a score.
 The DataSet[(I, O)] would probably be the output of the predict function.
 For example in MultipleLinearRegression, we can call predict on a labeled 
 dataset, get back predictions for each item in the data, and then call score 
 with the resulting dataset as an argument and we should get back a score for 
 the prediction quality, such as the R^2 score.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Hits

2015-06-02 Thread mfahimazizi

GitHub user mfahimazizi opened a pull request:

https://github.com/apache/flink/pull/765

Hits

HITS algorithm in Gelly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mfahimazizi/flink HITS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #765


commit e3fe353a5a4fbca35a65672214f451cda58de6ab
Author: mfahimazizi mfahimaz...@gmail.com
Date:   2015-05-30T21:53:30Z

HITS algorithm added

commit 646660272c4c9b4152d2ebfc4bfbd6e04d891408
Author: mfahimazizi mfahimaz...@gmail.com
Date:   2015-06-02T23:23:15Z

HITS algorithm added_




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2138) PartitionCustom for streaming

Márton Balassi created FLINK-2138:
-

 Summary: PartitionCustom for streaming
 Key: FLINK-2138
 URL: https://issues.apache.org/jira/browse/FLINK-2138
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Priority: Minor


The batch API has support for custom partitioning, this should be added for 
streaming with a similar signature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [contrib] Storm compatibility

GitHub user mbalassi opened a pull request:

https://github.com/apache/flink/pull/764

[contrib] Storm compatibility

This is an updated version of #573.

@mjsax addressed a number of comments since then and added Readmes, @szape 
did a code review and added an extra interface and I have moved the codebase to 
flink-contrib.

For the latter to be reasonable I felt necessary to break up flink-contrib 
to smaller submodules, despite @rmetzger advising against this approach. The 
names of the submodules are questionable, suggestions are welcome.

I've seen test failures in storm-compatibility-core, @mjsax could you take 
a look please? I managed to make some error in the pom for storm-compatibility 
examples, so that does not build currently, will check it after I got some 
sleep. :)

I would like to merge this to master as soon as the 0.9 branch is forked 
off and we can agree on it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbalassi/flink storm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/764.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #764


commit c595c265d7b40106181e16b594a34cdf06beb017
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-13T15:56:12Z

[storm-compat] Introduced Storm wrappers to Flink

commit 3db5335639f9d8fd0cbb1c7799af6205f8c04996
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T10:13:20Z

[strom-compat] Added Storm API compatibility classes

commit 5ed5bbe5fa92e967de8ad448955aec7115fcd533
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T10:26:43Z

[storm-compat] Added tests for Storm compatibility API

commit 9e89189ae7d0e7c48a1b9cc466e871d6919144c0
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T10:28:59Z

[storm-compat] Added tests for Storm compatibility wrappers

commit ebdc0d23a6502de542d269a6f0881c89057d2e0e
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T10:55:44Z

[storm-compat] Added abstract base to Storm compatibility examples

commit 09ac389630f231f5a793da97b993c52b0e16777b
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T10:57:46Z

[storm-compat] Added Storm compatibility word count examples

commit 5cf034e1d34e7ccfd745e2f7e93f2d247d3e6bbf
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T10:59:38Z

[storm-compat] Added ITCases to Storm compatibility examples

commit 2eec0ac002d04ecfc09bf524ef3923979c141558
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-14T11:04:01Z

[storm-compat] Added README files to Storm compatibility modules

commit 2d200c0d4bc16c9f8c4999d5d4fa07e527f3824c
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-21T08:15:20Z

[storm-compat] Storm compatibility code cleanup

commit b06ec1bc37c3d68a4e4f3568e9c37a15bc3881e8
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-05-21T08:16:58Z

[storm-compat] Simple examples added

commit 61ab549fd0a25a09df7b8cfcd16372b93b2cceec
Author: szape nemderogator...@gmail.com
Date:   2015-05-28T14:31:18Z

[storm-compat] Storm compatibility layer wrappers refactor

commit ed189aac5de79f1af50efcbb0a1d9defcc582d1a
Author: szape nemderogator...@gmail.com
Date:   2015-06-02T08:53:02Z

[storm-compat] Added FiniteStormSpout interface

commit 9704649267dd29291bb14f223f17a18fc2c26cd0
Author: mbalassi mbala...@apache.org
Date:   2015-06-02T21:38:29Z

[storm-compat] Moved Storm compatibility to flink-contrib

commit 3db8724d92578f3815229c0ed2d6dafd7ddf6fd5
Author: mbalassi mbala...@apache.org
Date:   2015-06-02T22:25:21Z

[contrib] [build] Contrib separated to small projects




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2139) Test Streaming Outputformats

Márton Balassi created FLINK-2139:
-

 Summary: Test Streaming Outputformats
 Key: FLINK-2139
 URL: https://issues.apache.org/jira/browse/FLINK-2139
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
 Fix For: 0.9


Currently the only tested streaming core output is the writeAsTest and that is 
only tested indirectly in integration tests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2136) Test the streaming scala API

Márton Balassi created FLINK-2136:
-

 Summary: Test the streaming scala API
 Key: FLINK-2136
 URL: https://issues.apache.org/jira/browse/FLINK-2136
 Project: Flink
  Issue Type: Test
  Components: Scala API, Streaming
Affects Versions: 0.9
Reporter: Márton Balassi


There are no test covering the streaming scala API. I would suggest to test 
whether the StreamGraph created by a certain operation looks as expected. 
Deeper layers and runtime should not be tested here, that is done in 
streaming-core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [contrib] Storm compatibility

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/764#issuecomment-108118618
  
Oh, and one more info for @mjsax: I added you as collaborator to my repo, 
so you can directly push to this branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2137) Expose partitionByHash for WindowedDataStream

Márton Balassi created FLINK-2137:
-

 Summary: Expose partitionByHash for WindowedDataStream
 Key: FLINK-2137
 URL: https://issues.apache.org/jira/browse/FLINK-2137
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gábor Hermann


This functionality has been recently exposed for DataStreams and 
ConnectedDataStreams, but not for WindowedDataStreams yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/759#issuecomment-108122772
  
Added a JIRA for the 
[issue](https://issues.apache.org/jira/browse/FLINK-2139).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Remove extra HTML tags in TypeInformation Java...

2015-06-02 Thread hsaputra

GitHub user hsaputra opened a pull request:

https://github.com/apache/flink/pull/766

Remove extra HTML tags in TypeInformation JavaDoc class header.

Simple clean up PR remove extra ```li``` HTML tags in TypeInformation 
JavaDoc class header.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsaputra/flink 
remove_extra_li_javadoc_typeinformation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/766.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #766


commit f02fc66b9897cd6c76f193ef514b41388e7bf537
Author: Henry Saputra hsapu...@apache.org
Date:   2015-06-03T04:10:06Z

Remove extra li HTML tags in TypeInformation JavaDoc class header.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568988#comment-14568988
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107924813
  
I understand that concern.

But using sysoutput will also be interleaved with the Client sysout 
printing and the regular system logging.
Also, I think its very bad practice to print stuff using systemout, because 
its not controllable in any way.
With log4j we can configure the analysis output the way we want.
if you want the messages to look like regular sysout text, we can specify a 
custom output schema for the classes in the sca java package.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1430] [streaming] Scala API completenes...

2015-06-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/753


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107924813
  
I understand that concern.

But using sysoutput will also be interleaved with the Client sysout 
printing and the regular system logging.
Also, I think its very bad practice to print stuff using systemout, because 
its not controllable in any way.
With log4j we can configure the analysis output the way we want.
if you want the messages to look like regular sysout text, we can specify a 
custom output schema for the classes in the sca java package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2111) Add terminate signal to cleanly stop streaming jobs

[
https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568994#comment-14568994
]

ASF GitHub Bot commented on FLINK-2111:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-107926235

Hi,
after thinking about the signal name again, I disagree. terminate does
not mean kill, but is a request to gracefully shut down. stop from my point
of view is too soft, as it does not indicates strongly enough, that the job is
completely shut down and cleared. I would interpret stop more as an
interrupt signal (with an according re-start signal, that wakes up the job
again).

Any other opinions? Maybe we need a voting for the signal name.

Add terminate signal to cleanly stop streaming jobs
-

Key: FLINK-2111
URL: https://issues.apache.org/jira/browse/FLINK-2111
Project: Flink
Issue Type: Improvement
Components: Distributed Runtime, JobManager, Local Runtime,
Streaming, TaskManager, Webfrontend
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

Currently, streaming jobs can only be stopped using cancel command, what is
a hard stop with no clean shutdown.
The new introduced terminate signal, will only affect streaming source
tasks such that the sources can stop emitting data and terminate cleanly,
resulting in a clean termination of the whole streaming job.
This feature is a pre-requirment for
https://issues.apache.org/jira/browse/FLINK-1929

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [ml] Rework of the optimization framework

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/758

[ml] Rework of the optimization framework

This PR reworks the current optimization framework to make the 
regularization part of the optimization algorithm. Furthermore, it consolidates 
the `LossFunction` by defining a common interface for loss functions. The 
`GenericLossFunction` allows to construct loss function by specifying the outer 
loss function and the prediction function individually.

Additionally, this PR introduces some syntactic sugar which makes 
programming with broadcast variables easier.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink mlSugar

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #758


commit e82782e5a165d9e3a348db5efe6c8188c9cbac72
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-05-28T01:03:24Z

[ml] Adds syntactic sugar for map with single broadcast element. Rewrites 
the optimization framework to to consolidate the loss function.

Adds closure cleaner to convenience functions on RichDataSet

Removing regularization from LossFunction and making it part of the 
optimizer.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2102) Add predict operation for LabeledVector

2015-06-02 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-2102.
--
Resolution: Fixed

Added via d163a817fa2e330e86384d0bbcd104f051a6fb48

 Add predict operation for LabeledVector
 ---

 Key: FLINK-2102
 URL: https://issues.apache.org/jira/browse/FLINK-2102
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 Currently we can only call predict on DataSet[V : Vector].
 A lot of times though we have a DataSet[LabeledVector] that we split into a 
 train and test set.
 We should be able to make predictions on the test DataSet[LabeledVector] 
 without having to transform it into a DataSet[Vector]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2102) Add predict operation for LabeledVector


[ 
https://issues.apache.org/jira/browse/FLINK-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568966#comment-14568966
 ] 

ASF GitHub Bot commented on FLINK-2102:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/744


 Add predict operation for LabeledVector
 ---

 Key: FLINK-2102
 URL: https://issues.apache.org/jira/browse/FLINK-2102
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 Currently we can only call predict on DataSet[V : Vector].
 A lot of times though we have a DataSet[LabeledVector] that we split into a 
 train and test set.
 We should be able to make predictions on the test DataSet[LabeledVector] 
 without having to transform it into a DataSet[Vector]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2102] [ml] Add predict operation for La...

2015-06-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/744


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [ml] [WIP] Consolidation of loss function

Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/740


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [ml] [WIP] Consolidation of loss function

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/740#issuecomment-107903012
  
I closed this PR to open it as a proper PR again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness


[ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568874#comment-14568874
 ] 

ASF GitHub Bot commented on FLINK-1430:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/753


 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi

 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [ml] Rework of the optimization framework

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/758#issuecomment-107905323
  
This PR is the new PR for #740. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file

2015-06-02 Thread Faye Beligianni (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568999#comment-14568999
 ] 

Faye Beligianni commented on FLINK-2069:


Hello, I am using a timeout of 1 in {{iterate()}} call. 
You can find the code that I tried to use the {{writeAsCsv}} function in this 
link:
[https://github.com/fobeligi/incubator-flink/blob/inc-ml-test/flink-staging/flink-streaming/flink-streaming-ml/src/test/scala/org/apache/flink/streaming/incrementalML/test/classifier/HoeffdingTreeITSuite.scala]

I am using this scala test in order to run an implementation of the Hoeffding 
Tree algorithm.

 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569150#comment-14569150
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31527840
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+/**
+ * Specifies to which extent user-defined functions are analyzed in order
+ * to give the Flink optimizer an insight of UDF internals and inform
+ * the user about common implementation mistakes.
+ *
+ */
+public enum UdfAnalysisMode {
--- End diff --

This is user-facing. I vote to rename it. @rmetzger agrees that in his 
experience the UDF part can be misleading. I understand why you chose this 
though... the operators make use of the UDF term all over the place. What about 
`CodeAnalysisMode`? After all both this PR and the package are called *code 
analysis* and not UDF analysis.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31527840
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+/**
+ * Specifies to which extent user-defined functions are analyzed in order
+ * to give the Flink optimizer an insight of UDF internals and inform
+ * the user about common implementation mistakes.
+ *
+ */
+public enum UdfAnalysisMode {
--- End diff --

This is user-facing. I vote to rename it. @rmetzger agrees that in his 
experience the UDF part can be misleading. I understand why you chose this 
though... the operators make use of the UDF term all over the place. What about 
`CodeAnalysisMode`? After all both this PR and the package are called *code 
analysis* and not UDF analysis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569154#comment-14569154
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31528084
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+/**
+ * Specifies to which extent user-defined functions are analyzed in order
--- End diff --

- I would make this more concrete. What about the list from your initial PR 
comment?
- line 25: empty


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements


[ 
https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569132#comment-14569132
 ] 

Robert Metzger commented on FLINK-2130:
---

I would let the source fail. All the other components in Flink also always fail 
immediately.
I guess there are ways to configure retries at the RabbitMQ connector.

 RabbitMQ source does not fail when failing to retrieve elements
 ---

 Key: FLINK-2130
 URL: https://issues.apache.org/jira/browse/FLINK-2130
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Streaming Connectors
Reporter: Stephan Ewen

 The RMQ source only logs when elements cannot be retrieved. Failures are not 
 propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569137#comment-14569137
 ] 

Aljoscha Krettek commented on FLINK-2069:
-

Ha! I can finally reproduce it. No fix yet, though.

 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31527970
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+/**
+ * Specifies to which extent user-defined functions are analyzed in order
+ * to give the Flink optimizer an insight of UDF internals and inform
+ * the user about common implementation mistakes.
+ *
+ */
+public enum UdfAnalysisMode {
+
+   /**
+* UDF analysis does not take place.
+*/
+   DISABLED,
+
+   /**
+* Hints for improvement of the program are printed to the log.
+*/
+   HINTING_ENABLED,
+
+   /**
+* The program will be automatically optimized with knowledge from UDF
+* analysis.
+*/
+   OPTIMIZING_ENABLED;
+
+}
--- End diff --

Since the user will have to set this, what about keeping it short? 
`DISABLE`, `HINT`, `OPTIMIZE`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569184#comment-14569184
 ] 

Aljoscha Krettek commented on FLINK-2069:
-

I think I finally fixed it. Could you try this code: 
https://github.com/apache/flink/pull/759

 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569183#comment-14569183
 ] 

ASF GitHub Bot commented on FLINK-2069:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/759

[FLINK-2069] Fix Scala CSV Output Format

Before, the Scala Streaming API used the Java CsvOutputFormat. This is not 
compatible with Scala Tuples. The FileSinkFunction would silently swallow the 
thrown exceptions and the job would finish cleanly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink fix-scala-csv-output

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #759


commit 16986ced8b6961bb2ada269b29dd4a5b59159983
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-06-02T14:37:06Z

[FLINK-2069] Fix Scala CSV Output Format




 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31530742
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private MapString, DataSet? broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

I think this refactoring is quite fragile. The semantic properties utility 
is not returning an empty properties object, but null and you take care of 
setting it correctly here depending on whether the forwarded fields have been 
set manually or not.

If optimize is enabled and there are manual annotations, they will be 
overriden. I am wondering if it is better to have manual annotations trump 
optimizer annotations. What's your opinion on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements


 [ 
https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2130:
--
Component/s: Streaming Connectors
 Streaming

 RabbitMQ source does not fail when failing to retrieve elements
 ---

 Key: FLINK-2130
 URL: https://issues.apache.org/jira/browse/FLINK-2130
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Streaming Connectors
Reporter: Stephan Ewen

 The RMQ source only logs when elements cannot be retrieved. Failures are not 
 propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569152#comment-14569152
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31527970
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+/**
+ * Specifies to which extent user-defined functions are analyzed in order
+ * to give the Flink optimizer an insight of UDF internals and inform
+ * the user about common implementation mistakes.
+ *
+ */
+public enum UdfAnalysisMode {
+
+   /**
+* UDF analysis does not take place.
+*/
+   DISABLED,
+
+   /**
+* Hints for improvement of the program are printed to the log.
+*/
+   HINTING_ENABLED,
+
+   /**
+* The program will be automatically optimized with knowledge from UDF
+* analysis.
+*/
+   OPTIMIZING_ENABLED;
+
+}
--- End diff --

Since the user will have to set this, what about keeping it short? 
`DISABLE`, `HINT`, `OPTIMIZE`?


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569202#comment-14569202
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31530742
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private MapString, DataSet? broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

I think this refactoring is quite fragile. The semantic properties utility 
is not returning an empty properties object, but null and you take care of 
setting it correctly here depending on whether the forwarded fields have been 
set manually or not.

If optimize is enabled and there are manual annotations, they will be 
overriden. I am wondering if it is better to have manual annotations trump 
optimizer annotations. What's your opinion on this?


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2111] Add terminate signal to cleanly...

2015-06-02 Thread mjsax

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-107926235
  
Hi,
after thinking about the signal name again, I disagree. terminate does 
not mean kill, but is a request to gracefully shut down. stop from my point 
of view is too soft, as it does not indicates strongly enough, that the job is 
completely shut down and cleared. I would interpret stop more as an 
interrupt signal (with an according re-start signal, that wakes up the job 
again).

Any other opinions? Maybe we need a voting for the signal name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31528084
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/UdfAnalysisMode.java ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+/**
+ * Specifies to which extent user-defined functions are analyzed in order
--- End diff --

- I would make this more concrete. What about the list from your initial PR 
comment?
- line 25: empty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569179#comment-14569179
 ] 

ASF GitHub Bot commented on FLINK-2069:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/724


 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format

2015-06-02 Thread aljoscha

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/759

[FLINK-2069] Fix Scala CSV Output Format

Before, the Scala Streaming API used the Java CsvOutputFormat. This is not 
compatible with Scala Tuples. The FileSinkFunction would silently swallow the 
thrown exceptions and the job would finish cleanly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink fix-scala-csv-output

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #759


commit 16986ced8b6961bb2ada269b29dd4a5b59159983
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-06-02T14:37:06Z

[FLINK-2069] Fix Scala CSV Output Format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [ml] Rework of the optimization framework

2015-06-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/758


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph


[ 
https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569035#comment-14569035
 ] 

Aljoscha Krettek commented on FLINK-2133:
-

Also, this is on my checkpoint-hardening branch: 
https://github.com/aljoscha/flink/tree/checkpoint-hardening
I don't know if this also occurs on master. I'm not seeing this often, most of 
the travis runs go trough.

 Possible deadlock in ExecutionGraph
 ---

 Key: FLINK-2133
 URL: https://issues.apache.org/jira/browse/FLINK-2133
 Project: Flink
  Issue Type: Bug
Reporter: Aljoscha Krettek

 I had the following output on Travis:
 {code}
 Found one Java-level deadlock:
 =
 ForkJoinPool-1-worker-3:
   waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a 
 org.apache.flink.runtime.util.SerializableObject),
   which is held by flink-akka.actor.default-dispatcher-4
 flink-akka.actor.default-dispatcher-4:
   waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a 
 org.apache.flink.runtime.util.SerializableObject),
   which is held by ForkJoinPool-1-worker-3
 Java stack information for the threads listed above:
 ===
 ForkJoinPool-1-worker-3:
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338)
   - waiting to lock 0xd77fa8c0 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595)
   - locked 0xd77fa218 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733)
   at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at 
 scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 flink-akka.actor.default-dispatcher-4:
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683)
   - waiting to lock 0xd77fa218 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454)
   - locked 0xd77fa8c0 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565)
   at 
 org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Found 1 deadlock.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569042#comment-14569042
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107941730
  
 Naming UDF. The feedback of committers giving talks about Flink some time 
ago was that the name UDF was sometimes confusing. @rmetzger, can you confirm 
this? We might take this into account and rename the UdfAnalysisMode to 
something else, for example just CodeAnalysisMode.

That's right. People associate UDFs with SQL databases that allow to pass 
in custom functions (which is right, but they start thinking Flink is a SQL 
database).
In this case, its not super critical because its internal code. 


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph


[ 
https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569062#comment-14569062
 ] 

Ufuk Celebi commented on FLINK-2133:


I've looked at the ExecutionGraph and this seems to be a simple deadlock due to 
the ordering of lock acquisitions.

Two tasks of the same JobVertex aquire the locks in the following order:
- T1 (ForkJoinPool-1-worker-3): ExecutionGraph#restart() aquires 
ExecutionGraph#progressLock = ExecutionJobVertex#reset() aquires 
ExecutionJobVertex#stateMonitor
- T2 (flink-akka.actor.default-dispatcher-4): 
ExecutionJobVertex#subtaskInFinalState acquires ExecutionJobVertex#stateMonitor 
to cancel task = ExecutionGraph#jobVertexInFinalState() aquires 
ExecutionGraph#progressLock

I think that both messages have to be triggered by the same task, because both 
actions should only happen for the final vertex (I think cancel (transition to 
cancelling) and canceling complete msg (transition to cancelled)). 

 Possible deadlock in ExecutionGraph
 ---

 Key: FLINK-2133
 URL: https://issues.apache.org/jira/browse/FLINK-2133
 Project: Flink
  Issue Type: Bug
Reporter: Aljoscha Krettek

 I had the following output on Travis:
 {code}
 Found one Java-level deadlock:
 =
 ForkJoinPool-1-worker-3:
   waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a 
 org.apache.flink.runtime.util.SerializableObject),
   which is held by flink-akka.actor.default-dispatcher-4
 flink-akka.actor.default-dispatcher-4:
   waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a 
 org.apache.flink.runtime.util.SerializableObject),
   which is held by ForkJoinPool-1-worker-3
 Java stack information for the threads listed above:
 ===
 ForkJoinPool-1-worker-3:
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338)
   - waiting to lock 0xd77fa8c0 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595)
   - locked 0xd77fa218 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733)
   at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at 
 scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 flink-akka.actor.default-dispatcher-4:
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683)
   - waiting to lock 0xd77fa218 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454)
   - locked 0xd77fa8c0 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565)
   at 
 org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

[jira] [Commented] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements


[ 
https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569096#comment-14569096
 ] 

Márton Balassi commented on FLINK-2130:
---

This was a conscious design choice, so we do not fail a whole source task just 
because one single element was not read properly. Open and close failures are 
propagated. Would you suggest to do this on every failed message?

 RabbitMQ source does not fail when failing to retrieve elements
 ---

 Key: FLINK-2130
 URL: https://issues.apache.org/jira/browse/FLINK-2130
 Project: Flink
  Issue Type: Bug
Reporter: Stephan Ewen

 The RMQ source only logs when elements cannot be retrieved. Failures are not 
 propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph


[ 
https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569032#comment-14569032
 ] 

Aljoscha Krettek commented on FLINK-2133:
-

This was the whole travis run: 
https://travis-ci.org/aljoscha/flink/jobs/65039616

 Possible deadlock in ExecutionGraph
 ---

 Key: FLINK-2133
 URL: https://issues.apache.org/jira/browse/FLINK-2133
 Project: Flink
  Issue Type: Bug
Reporter: Aljoscha Krettek

 I had the following output on Travis:
 {code}
 Found one Java-level deadlock:
 =
 ForkJoinPool-1-worker-3:
   waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a 
 org.apache.flink.runtime.util.SerializableObject),
   which is held by flink-akka.actor.default-dispatcher-4
 flink-akka.actor.default-dispatcher-4:
   waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a 
 org.apache.flink.runtime.util.SerializableObject),
   which is held by ForkJoinPool-1-worker-3
 Java stack information for the threads listed above:
 ===
 ForkJoinPool-1-worker-3:
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338)
   - waiting to lock 0xd77fa8c0 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595)
   - locked 0xd77fa218 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733)
   at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at 
 scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 flink-akka.actor.default-dispatcher-4:
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683)
   - waiting to lock 0xd77fa218 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454)
   - locked 0xd77fa8c0 (a 
 org.apache.flink.runtime.util.SerializableObject)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565)
   at 
 org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653)
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Found 1 deadlock.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-1430) Add test for streaming scala api completeness


 [ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-1430.
-
   Resolution: Implemented
Fix Version/s: 0.9

Implemented via 50c818d

 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi
 Fix For: 0.9


 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2133) Possible deadlock in ExecutionGraph

Aljoscha Krettek created FLINK-2133:
---

 Summary: Possible deadlock in ExecutionGraph
 Key: FLINK-2133
 URL: https://issues.apache.org/jira/browse/FLINK-2133
 Project: Flink
  Issue Type: Bug
Reporter: Aljoscha Krettek


I had the following output on Travis:

{code}
Found one Java-level deadlock:
=
ForkJoinPool-1-worker-3:
  waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a 
org.apache.flink.runtime.util.SerializableObject),
  which is held by flink-akka.actor.default-dispatcher-4
flink-akka.actor.default-dispatcher-4:
  waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a 
org.apache.flink.runtime.util.SerializableObject),
  which is held by ForkJoinPool-1-worker-3
Java stack information for the threads listed above:
===
ForkJoinPool-1-worker-3:
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338)
- waiting to lock 0xd77fa8c0 (a 
org.apache.flink.runtime.util.SerializableObject)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595)
- locked 0xd77fa218 (a 
org.apache.flink.runtime.util.SerializableObject)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733)
at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at 
scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
flink-akka.actor.default-dispatcher-4:
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683)
- waiting to lock 0xd77fa218 (a 
org.apache.flink.runtime.util.SerializableObject)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454)
- locked 0xd77fa8c0 (a 
org.apache.flink.runtime.util.SerializableObject)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426)
at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565)
at 
org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Found 1 deadlock.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase

Ufuk Celebi created FLINK-2134:
--

 Summary: Deadlock in SuccessAfterNetworkBuffersFailureITCase
 Key: FLINK-2134
 URL: https://issues.apache.org/jira/browse/FLINK-2134
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ufuk Celebi


I ran into the issue in a Travis run for a PR: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt

I can reproduce this locally by running SuccessAfterNetworkBuffersFailureITCase 
multiple times:

{code}
cluster = new ForkableFlinkMiniCluster(config, false);
for (int i = 0; i  100; i++) {
   // run test programs CC, KMeans, CC
}
{code}

The iteration tasks wait for superstep notifications like this:

{code}
Join (Join at 
runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
(8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
[0x000123f2a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x0007f89e3440 (a java.lang.Object)
at 
org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
- locked 0x0007f89e3440 (a java.lang.Object)
at 
org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
{code}

I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. The 
system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107941730
  
 Naming UDF. The feedback of committers giving talks about Flink some time 
ago was that the name UDF was sometimes confusing. @rmetzger, can you confirm 
this? We might take this into account and rename the UdfAnalysisMode to 
something else, for example just CodeAnalysisMode.

That's right. People associate UDFs with SQL databases that allow to pass 
in custom functions (which is right, but they start thinking Flink is a SQL 
database).
In this case, its not super critical because its internal code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2128) ScalaShellITSuite failing


 [ 
https://issues.apache.org/jira/browse/FLINK-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-2128.

Resolution: Duplicate

 ScalaShellITSuite failing
 -

 Key: FLINK-2128
 URL: https://issues.apache.org/jira/browse/FLINK-2128
 Project: Flink
  Issue Type: Bug
  Components: Scala Shell
Affects Versions: master
Reporter: Ufuk Celebi

 https://s3.amazonaws.com/archive.travis-ci.org/jobs/64947781/log.txt
 {code}
 ScalaShellITSuite:
 log4j:ERROR setFile(null,true) call failed.
 java.io.FileNotFoundException: /.log (Permission denied)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:221)
   at java.io.FileOutputStream.init(FileOutputStream.java:142)
   at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
   at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
   at 
 org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
   at 
 org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
   at 
 org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
   at 
 org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
   at 
 org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
   at 
 org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
   at 
 org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
   at 
 org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
   at 
 org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
   at org.apache.log4j.LogManager.clinit(LogManager.java:127)
   at 
 org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277)
   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288)
   at 
 org.apache.flink.test.util.TestBaseUtils.clinit(TestBaseUtils.java:69)
   at 
 org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:198)
   at 
 org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
   at 
 org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:33)
   at 
 org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
   at 
 org.apache.flink.api.scala.ScalaShellITSuite.run(ScalaShellITSuite.scala:33)
   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
   at 
 org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
   at 
 org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
   at 
 org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
   at org.scalatest.Suite$class.run(Suite.scala:1421)
   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
   at 
 org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
   at 
 org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
   at 
 org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
   at 
 org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
   at 
 org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
   at 
 org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
   at org.scalatest.tools.Runner$.main(Runner.scala:860)
   at org.scalatest.tools.Runner.main(Runner.scala)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31531993
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private MapString, DataSet? broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

I saw that there is a check for this in the UdfAnalyzerUtil.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD


[ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569220#comment-14569220
 ] 

ASF GitHub Bot commented on FLINK-1993:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/760

[FLINK-1993] [ml] Replaces custom SGD in MultipleLinearRegression with 
optimizer's SGD

This PR replaces the custom SGD implementation with the optimization 
framework's SGD implementation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink replaceSGDInMLR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #760


commit 2df0c700237af0456c3770c198ed595fdf205408
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-05-29T16:02:47Z

[FLINK-1993] [ml] Replaces custom SGD logic with optimization framework's 
SGD in MultipleLinearRegression

Fixes PipelineITSuite because of change MLR loss function




 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31532320
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/sca/UdfAnalyzer.java ---
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.sca;
+
+import org.apache.flink.api.common.functions.CoGroupFunction;
+import org.apache.flink.api.common.functions.CrossFunction;
+import org.apache.flink.api.common.functions.FilterFunction;
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.functions.GroupReduceFunction;
+import org.apache.flink.api.common.functions.JoinFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.common.operators.DualInputSemanticProperties;
+import org.apache.flink.api.common.operators.SemanticProperties;
+import org.apache.flink.api.common.operators.SingleInputSemanticProperties;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.functions.SemanticPropUtil;
+import org.apache.flink.api.java.operators.Keys;
+import org.apache.flink.api.java.operators.Keys.ExpressionKeys;
+import org.apache.flink.api.java.sca.TaggedValue.Input;
+import org.objectweb.asm.Type;
+import org.objectweb.asm.tree.MethodNode;
+
+import java.lang.reflect.Method;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.convertTypeInfoToTaggedValue;
+import static 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.findMethodNode;
+import static 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.mergeReturnValues;
+import static 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.removeUngroupedInputsFromContainer;
+
+public class UdfAnalyzer {
+   // exclusion to suppress hints for API operators
+   private static final String EXCLUDED_CLASSPATH = org/apache/flink;
--- End diff --

Instead of excluding this class path, the consensus from reviews so far is 
to add an `@SkipCodeAnalysis` annotation. This will allow new users to play 
around with the Flink examples etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31532974
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.operators;
+
+import org.apache.flink.api.common.UdfAnalysisMode;
+import org.apache.flink.api.common.functions.Function;
+import org.apache.flink.api.common.functions.InvalidTypesException;
+import org.apache.flink.api.common.operators.DualInputSemanticProperties;
+import org.apache.flink.api.common.operators.SingleInputSemanticProperties;
+import org.apache.flink.api.java.sca.UdfAnalyzer;
+import org.apache.flink.api.java.sca.UdfAnalyzerException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public abstract class UdfOperatorUtils {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(UdfOperatorUtils.class);
+
+   public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, 
? operator, Class? udfBaseClass,
+   Function udf, Keys? key) {
+   final UdfAnalysisMode mode = 
operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode();
+   if (mode != UdfAnalysisMode.DISABLED) {
+   try {
+   final UdfAnalyzer analyzer = new 
UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null,
+   operator.getResultType(), key, 
null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED);
+   final boolean success = analyzer.analyze();
+   if (success) {
+   if (mode == 
UdfAnalysisMode.OPTIMIZING_ENABLED
+
!operator.udfWithForwardedFieldsAnnotation(udf.getClass())) {
+   
operator.setSemanticProperties((SingleInputSemanticProperties) 
analyzer.getSemanticProperties());
+   
operator.setAnalyzedUdfSemanticsFlag();
--- End diff --

I think it would make sense to also print the inferred forwarded fields (at 
least for debugging purposes).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569242#comment-14569242
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31532974
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.operators;
+
+import org.apache.flink.api.common.UdfAnalysisMode;
+import org.apache.flink.api.common.functions.Function;
+import org.apache.flink.api.common.functions.InvalidTypesException;
+import org.apache.flink.api.common.operators.DualInputSemanticProperties;
+import org.apache.flink.api.common.operators.SingleInputSemanticProperties;
+import org.apache.flink.api.java.sca.UdfAnalyzer;
+import org.apache.flink.api.java.sca.UdfAnalyzerException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public abstract class UdfOperatorUtils {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(UdfOperatorUtils.class);
+
+   public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, 
? operator, Class? udfBaseClass,
+   Function udf, Keys? key) {
+   final UdfAnalysisMode mode = 
operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode();
+   if (mode != UdfAnalysisMode.DISABLED) {
+   try {
+   final UdfAnalyzer analyzer = new 
UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null,
+   operator.getResultType(), key, 
null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED);
+   final boolean success = analyzer.analyze();
+   if (success) {
+   if (mode == 
UdfAnalysisMode.OPTIMIZING_ENABLED
+
!operator.udfWithForwardedFieldsAnnotation(udf.getClass())) {
+   
operator.setSemanticProperties((SingleInputSemanticProperties) 
analyzer.getSemanticProperties());
+   
operator.setAnalyzedUdfSemanticsFlag();
--- End diff --

I think it would make sense to also print the inferred forwarded fields (at 
least for debugging purposes).


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569218#comment-14569218
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31531758
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.operators;
+
+import org.apache.flink.api.common.UdfAnalysisMode;
+import org.apache.flink.api.common.functions.Function;
+import org.apache.flink.api.common.functions.InvalidTypesException;
+import org.apache.flink.api.common.operators.DualInputSemanticProperties;
+import org.apache.flink.api.common.operators.SingleInputSemanticProperties;
+import org.apache.flink.api.java.sca.UdfAnalyzer;
+import org.apache.flink.api.java.sca.UdfAnalyzerException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public abstract class UdfOperatorUtils {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(UdfOperatorUtils.class);
+
+   public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, 
? operator, Class? udfBaseClass,
--- End diff --

I vote to pass the name of the operator as well. The log output will then 
be more consistent. Currently the class name is used.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/760

[FLINK-1993] [ml] Replaces custom SGD in MultipleLinearRegression with 
optimizer's SGD

This PR replaces the custom SGD implementation with the optimization 
framework's SGD implementation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink replaceSGDInMLR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #760


commit 2df0c700237af0456c3770c198ed595fdf205408
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-05-29T16:02:47Z

[FLINK-1993] [ml] Replaces custom SGD logic with optimization framework's 
SGD in MultipleLinearRegression

Fixes PipelineITSuite because of change MLR loss function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569271#comment-14569271
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31535397
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/UdfOperatorUtils.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.operators;
+
+import org.apache.flink.api.common.UdfAnalysisMode;
+import org.apache.flink.api.common.functions.Function;
+import org.apache.flink.api.common.functions.InvalidTypesException;
+import org.apache.flink.api.common.operators.DualInputSemanticProperties;
+import org.apache.flink.api.common.operators.SingleInputSemanticProperties;
+import org.apache.flink.api.java.sca.UdfAnalyzer;
+import org.apache.flink.api.java.sca.UdfAnalyzerException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public abstract class UdfOperatorUtils {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(UdfOperatorUtils.class);
+
+   public static void analyzeSingleInputUdf(SingleInputUdfOperator?, ?, 
? operator, Class? udfBaseClass,
+   Function udf, Keys? key) {
+   final UdfAnalysisMode mode = 
operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode();
+   if (mode != UdfAnalysisMode.DISABLED) {
+   try {
+   final UdfAnalyzer analyzer = new 
UdfAnalyzer(udfBaseClass, udf.getClass(), operator.getInputType(), null,
+   operator.getResultType(), key, 
null, mode == UdfAnalysisMode.OPTIMIZING_ENABLED);
+   final boolean success = analyzer.analyze();
+   if (success) {
+   if (mode == 
UdfAnalysisMode.OPTIMIZING_ENABLED
+
!operator.udfWithForwardedFieldsAnnotation(udf.getClass())) {
+   
operator.setSemanticProperties((SingleInputSemanticProperties) 
analyzer.getSemanticProperties());
+   
operator.setAnalyzedUdfSemanticsFlag();
+   }
+   else if (mode == 
UdfAnalysisMode.HINTING_ENABLED) {
+   
analyzer.addSemanticPropertiesHints();
+   }
+   LOG.info(analyzer.getHintsString());
+   }
+   }
+   catch (InvalidTypesException e) {
+   LOG.debug(Unable to do UDF analysis due to 
missing type information., e);
+   }
+   catch (UdfAnalyzerException e) {
+   LOG.debug(UDF analysis failed., e);
+   }
+   }
+   }
+
+   public static void analyzeDualInputUdf(TwoInputUdfOperator?, ?, ?, ? 
operator, Class? udfBaseClass,
+   Function udf, Keys? key1, Keys? key2) {
+   final UdfAnalysisMode mode = 
operator.getExecutionEnvironment().getConfig().getUdfAnalysisMode();
+   if (mode != UdfAnalysisMode.DISABLED) {
--- End diff --

We could log that the analysis is disabled as well.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...